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ON PRELIMINARY SAMPLING WITH SPECIAL 
REFERENCE TO THE MICROBIOLOGICAL CONTROL 
OF TOMATO CONCENTRATES 


©. Evreté anp K. SarKapI 
Hungarian Central Statistical Office and Mathematical Institute of 
the Hungarian Academy of Sciences, Budapest, Hungary 


INTRODUCTION 


In cases where the lots of merchandise are accepted by a fixed 
sampling procedure which is carried out only after delivery it is often 
suitable for the vendor to sample the lots before delivery in order to 
diminish cost of delivery of lots refused in fi’ 11 inspection. This paper 
deals with the problem of choosing the critical value in the preliminary 
sampling procedure if the conditions of the final sampling are known. 
We consider the case of a single parameter and make some reasonable 
assumptions. Section 2 gives the minimax solution of the problem. 
The solution given in Section 3 is based on a boundary condition for 
the probability of one type of error. Section 4 gives Bayes’ approach 
to the problem. 

The problem has arisen in connection with the preliminary control 
of tomato products. A lot of tomato puree is accepted by the consumer* 
if the Howard count found in the sample taken of the lot in question 
does not exceed a given limit [4], usually 40 percent. A question of 
considerable general practical importance has arisen: What limit (pre- 
liminary limit) is to be used by the vendor before delivering the lots 
in order to retain the bad lots which would be refused in the final 
acceptance control? 

In the following we treat the mathematical theory in connection 
with the tomato problem. Of course the theory can be applied more 
generally to similar practical problems. 

We assume that the quality of the lots can be characterized by a 
- single parameter @ (e.g. the expected value of the Howard count in 
the lot). Let P,(6) denote the probability of accepting a lot in the 


*The word “consumer” is used here in its general sense, denoting manufacturers who buy the 
puree for further processing. 
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final sampling procedure, and P,(/, 6) = P,(, < #), the distribution 
function of the sample statistic ~; on which the preliminary decision 
is based. The functions P,(6@) and P,(¢, @) are assumed to be known. 

In our case, if mold figures would appear independently in the 
microscopic fields, P,(#, 6) and P.(@) could be evaluated on the basis 
of the binomial distribution (@ being the expected value of the Howard 
count). This independence, however, cannot be expected, because of 
the method of examination. The so-called Howard slides with 25 
fields are used; the appearance of mold figures in the neighbouring 
fields will probably be positively correlated and thus the distribution 
of the Howard count will have higher variance than expected on the 
basis of the binomial distribution. This has been shown, in fact, by 
the experiments undertaken by the Hungarian Institute for Research 
in Canning, Meat Packing, and Refrigeration and the Hungarian 
Control Station for Tomato Products. Here some lots have been 
investigated exhaustively. From each lot 500 samples were taken 
and in every sample 100 microscopic fields were inspected. The distri- 
bution of the Howard count was found to be of considerably higher 
variance than could be expected on the basis of a binomial distribution 
4]. It seemed to be suitable to carry out a homoscedastic transfor- 
mation (cf. [1]). This transformation is given in Table 1. The trans- 
formed Howard count was found to be nearly normal with constant 
variance-1. Thus we apply the normal approximation. If we suppose 
both in the preliminary control and in the final control that the number 


TABLE 1 
Homoscepastic TRANSFORMATION OF THE HowarpD Count 
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of inspected microscopic fields to be 100 we have 


P(0) = o(y — 8) 
and 


P\(t, 0=) = — 6) 
where 


t 
and y = 12.66 corresponds to the original limit of 40 percent used in 
the final acceptance test (more precisely the transformed value of 
40.5 percent as correcting for continuity). 6@ denotes the expected 
value of the transformed Howard count. 

We may choose the number of microscopic fields to be inspected 
in the preliminary sampling to be larger than in the final one. If we 
prescribe the inspection of n .100 fields, we may use the mean of the 
n transformed Howard counts as sample statistic in the preliminary 
sampling. In this case 


P,(t, 0) = — 6) Vn. 


The normal approximation will facilitate the numerical calculations 
but the results of our paper may be applied without it as well. Re- 
garding the form of the functions P,(@) and P,(#, @) it must be assumed 
only that they are monotonically decreasing with 6. 

Our task is to determine for the preliminary sampling the most 
suitable limit x of the critical region ~, > x. 

In the following let us denote by w, the loss caused in the course 
of the preliminary inspection by the incorrect decision of retaining 
a lot which would have been accepted at the final inspection and by 
wz the loss caused by the decision to ship a lot which will be refused. 
Thus in our case w, is the loss caused by the fact that a lot will be 
regarded as having inferior quality, though it would be accepted by 
the consumer in final acceptance procedure; w, includes the costs of 
delivery to the consumer and back to the vendor and it may include 
the moral loss too if this can be reasonably estimated. 


2. THE MINIMAX SOLUTION 


We wish to diminish the loss caused by the wrong decision in the 
preliminary sampling. As the loss is a random variate, its expected 
value 


r(x, 6) = wP,(z, — P.(@)] + will — P,(x, (1) 
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should be minimized. But r(z, @) depends on the unknown quantity @. 

In the present section we assume that we have no information about 
6. In such cases the minimax method can be applied. This is based 
on the principle that the expected loss in case of the most unfavourable 
value of the parameter is to be minimized. (Of course, this principle 
is somewhat pessimistic). For any z, the most unfavourable value 
. Of @ is that where r(z, 6) takes its maximal value max, r(z, @) and thus 
we want to determine that value x, of z for which there exists a 6, 
such that 


, = min max r(z, 6). (2) 


(For the sake of simplicity, we assume that max, r(z, 6) and 
min, max, r(z, 6) exist). 
As will be shown below, if there exists a solution of the system of 
equations 
= w,/(w, + we) 
, = wi/(wi + wr), 
then this is a solution of the minimax problem too. Let us substitute 
the value of P,(6) from (3) into (1). Then we see that 
r(z, = w.w,/(w, + 
does not depend on z, i.e. 


(3) 


= , 06) (4) 
for every value of z. 

Further, by virtue of the supposed monotonicities, we can write 
, 6) = , + u and P,(@) = + v where u and v 
have the same sign as 6 — 6. Consequently using (3) and (1) we 
obtain 


, 0) = + — (w, + 
Since w > 0, we have 


, 6) r(x , bo). (5) 
On the other hand, since 


max r(z, & r(z, 6), 
from (4) it follows that 
min max r(z, 6) 2 minr(z, 6) =r(z0,%). 
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Further, using (5), we obtain 


min max r(x, 6) S max , 6) = r(xo , (7) 


From (6) and (7) follows (2), i.e. x , 0 are in fact the solution 
of the minimax problem. 

The system of equations (3) has always a solution if the functions 
P,(x, 6) and P,(@) are continuous, and the latter varies from 1 to 0; 
in this case, max, r(x, 6) and min, max, r(z, @) also exist. It can be 
shown, however, that no real difficulty arises if we suppose only the 
monotonicities mentioned in the Introduction. 

In the special case in which both statistics used in the two sampling 
procedures are normal, more precisely if 


P,(0) = and P,(z, 6) = (8) 


(where o, and o, are known), then from (3) we obtain 


\ 

r= yt + (— =). 

Thus in the case of the transformed Howard count, where o, = 1, 
o, = 1/Vn, we obtain 


z= y+ (1+ 


W, + We 
3. SOLUTION BASED ON BOUNDARY CONDITION 


In practice it is often difficult to state the values of w, and w, . 
Therefore a simpler starting principle would be of practical interest. 
In the case of the common tests of significance, the level of the error 
of the first kind is limited by some prescribed value. Similarly in 
the present problem we may require that the joint probability of 
accepting a lot in the preliminary sampling and rejecting the same 
lot in the final one is to be limited from above by a prescribed constant 
a. In other words 


P,(z, — P2(6)] S (9) 


The problem éan be solved in general only numerically. In the 
case of the normal distribution [cf. formula (3)] (9) has the form 


a0) 


The solution can be reached by determining the value of 6 = 4 


} 
1. 
4 
j 


344 BIOMETRICS, SEPTEMBER 1960 


which maximizes the left-hand member of (10). By differentiation, 
we obtain 


te fo) —#) (11) 
0; G2 

where Z(#) designates Mills’ ratio, ic. Z(t) = $(f)/¢’(t). We obtain 

the largest value of x satisfying the inequality (10) hy solving simul- 

taneously equation (11) and the equation 


aa 


This can be done numerically with the aid of the table of the normal 
distribution function and that of Mills’ ratio which can be found in [2]. 

In the special case ¢, = o, = o@ it follows from (11), by virtue of 
the monotonicity of Mills’ ratio, that 6 = (xz + y)/2 and thus we 
obtain from (12). 


z= y + (Va). (13) 


In the case of the transformed Howard counts, (11) and (12) have 
the forms 


— 0) — y) =a 
and we get for the case n = 1 
z= y+ 


Table 2 contains the numerical values of z for n = 1, 2, 3, 4 and 
for 1 — a = .95 and .99. For the case n = 1, we give the retrans- 
formed values of x too (italicized) which in this case can be immediately 
compared with the actual values of the Howard count in the preliminary 
sampling. The discrepancies between these values and those of Table 7 
of [4] are due to the fact that in [4] the homoscedastic transformation 
was not applied. 

We note that in applying the method described in this section to 


- @ series of lots we have ensured that the number of lots which will be 


refused in the second control will be small compared with the number 
of all lots inspected in the first control. This, however, does not mean 
that most of the forwarded lots will be accepted in the final examina- 


tion. Jf the quality is bad, the number of all forwarded lots will be 


small but these lots, or most of them, will be as bad as the retained 
onea. “ This’mmeans that, if the quality is known to be bad, this must 
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TABLE 2 
Limits ACCORDING TO ForMuLA (13) 

n 95 .99 

1 11.14 10.10 

.80 
2 11.32 
3 11.36 10.52 

4 11.38 10.56 


be taken into account in the preliminary examination. The method 
described in the following section gives such a possibility. 


4, BAYES’ APPROACH 


In practice, before deciding upon the delivery of a certain lot, we 
know the Howard counts on a number of lots issued from the same 
factory or consignment. Usually these data show some degree of 
homogeneity. If the amount of data is sufficiently large, they can 
offer a basis for assuming an a priori distribution. This raises the 
possibility of applying Bayes’ method wherein the information arising 
from the consignment data can be taken into account. In the following 
discussion we assume that the parameter itself can be regarded as a 
random variable with known a priori distribution. Let ¢ be the 
characteristic variable of the final acceptance, i.e. ¢2 = 1 if the lot 
is accepted in the final procedure (or it would be accepted if sent) 
and otherwise ¢, = 0. According to our assumption, the joint distri- 
bution of £, and ¢ is completely known. We assume P(t, = 1 | £,) 
to be monotonically decreasing with £, . 

We have to minimize the expression 


wP(é, 2, &=0)+wPé 1). (14) 
If there exists a value x, for which 
P(S2 = 1 | = Xo) = w2/(w; + wr), (15) 


or at least (in case of discontinuity) 


Plt, = 1|& = 2) + w) for 
=> w2/(wi + w.) for «<2. 
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(for values of z for which the left-hand member exists), it is easy to 
see that z, defined by (15) minimizes the expression (14). Let 


0, if & >2z, 
i 
Then obviously 
= &) + will — = 1] &) 
= = O | = zo) (16) 


+ — = 1 | & = Zo). 


Taking expectations on both sides of (16), we see that zx = 2 
minimizes (14). 

If the left-hand member of (15) never gets greater, or never gets 
less, than the right-hand side, then obviously all lots having the given 
a priori distribution are to be sent, or all are to be retained, respectively. 

If w, and w, are not known but we wish to retain lots which have 
an a posteriori probability < a for acceptance, then we must put a 
instead of w,/(w, + we) in (15). 

Consider the special case when the conditional distributions of £, 
and ¢, with respect to the parameter @ are N(@, o,) and N(6, a2) re- 
spectively, where é is the statistic applied in the final sampling pro- 
cedure, and @ is itself a random variate having the distribution N (u, oo). 
In this case the vector variate (¢, , —) is normally distributed with 
expectation (4, and variance matrix 


+o; | 
% 


Thus, we get from (15), 
Van + + o902 + $"(a) 


2 
Go 


m= y+ Sy (17) 


where y is the limit of the critical region ¢, > y used in the final sampling 
procedure. 

The expected fraction of the lots rejected in the final procedure 
from the lots sent according to (17), i.e. 


P@ Sz) = S2,& > y)/P S 2) 
can be determined with the aid of the table of the two-dimensional 
normal distribution ([3] or [2]). 


_ In the case of the transformed Howard count, n = 1, o:.= 0, = 1 
and thus 


} 
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t= yt (a) (18) 


yr wy V (oo + + 1) 

mw and o, can be determined on the basis of the consignment data: 
the mean and variance of the empirical distribution are to be equated 
with » and 1 + o% respectively. 

Example. In two consignments of tomato puree, a sample was 
| taken from every lot. The Howard counts were determined and trans- 
formed. The distribution of the transformed Howard counts was found 
to be nearly normal with expectations 11.02 and 9.32 and with variances 
1.7 and 1.8 respectively. Regarding these values as u and 1 + ao? , 
we obtain from (18) the limits given in Table 3 for the Howard counts 
(retransformed values italicized). 


TABLE 3. 
Limits According to Formula 18 
l-e 
Consignment 

95 .99 
1 10.26 8.30 

16 
2. 12.39 10.55 

38 .26 
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EXPERIMENTAL DESIGN IN THE ESTIMATION 
OF HERITABILITY 
BY REGRESSION METHODS 


B. D. H. Larrer* anp ALAN ROBERTSON 
Institute of Animal Genetics, Edinburgh, Scotland. 


Crudely speaking, we have two different kinds of technique for the 
estimation of heritability, based on analysis of variance and regression 
coefficients respectively, though in some special cases we can cast the 
analysis in either form. The problems of design in the former have 
recently been discussed (Robertson, [1959]). In this paper, we shall 
be concerned with experimental design in regression methods—in parti- 
cular that of offspring on parent. 

We are then concerned to obtain the most accurate estimate of 
the regression coefficient (and of the heritability that derives from it) 


_ for a given expenditure of effort. This effort may be perhaps in time 


of measurement (as is often the case in Drosophila or in a carcass 
dissection in ‘pigs) or in money expended in the rearing of the animals 
(as would be the case for growth rate in beef cattle). We shall make 
the general assumption that, if n offspring are to be measured in each 
family, then the expenditure of effort can be considered as proportional 
to n + a where a is a constant depending on the particular circum- 
stances. We are then concerned to find the optimum value of n. 

In general, it is known that if a regression is based on f observations 
(each observation corresponding here to a family) then the sampling 
variance of the regression coefficient is given by 


‘where o? is the variance of deviations of the dependent variate (in 


this case the family mean) about the regression line, o2 is the variance 
of the independent variate (in this case the parental measurements) 


. and f the number of families. 


Now the deviation of observed family means about the regression 
line will, a priori, be made up of two parts: the first part will be that 


*Present address: Division of Plant Industry, C.S.I.R.O., Canberra, Australia. 
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of true family means (such as would be based on a large number of 
individuals within each family) and the second is due to sampling 
within families and is then inversely proportional to family size. The 
number of families { will be equal to the total effort 7’ divided by 
n + a. In all cases that we shall deal with we shall find that V(b) 
is of the form 


= [8 + + @)] — 3} 


where @ and y are functions of h’, the heritability. Then the optimum 
value of n, for the minimum value of V(b), is given as a first approxi- 
mation by 


no = ay/B 
and as a second approximation by 
= No — + a)*/2Ta]. 
REGRESSION OF OFFSPRING ON SINGLE PARENT 


We have in this case to distinguish three different situations de- 
pending on the other parent: 

(a) each offspring being only related through the measured parent 
as, for example, in the regression of son on sire for rate of gain in beef 
cattle. The offspring of each measured parent form a half-sib group 
and the correlation between the parent’s measurement and the true 
group mean is h, so that the measure of deviations of true group means 
from the regression line is o2(1 — h”)/4. 


V(b) = — h?)/4] + — + a)] — 3} 
= — h*)/4] + — + a)] — 3}. 
The first approximation is then 


The optimum value of n is then highest at the extreme values of h’. 
If h’ is low, the true family means will be close to the regression line 
because there is so little additive genetic variation—if it is high, they 
will do the same because the parent’s measurement is a good guide 
to the true family mean. If measurement is the main effort required, 


‘then a = 1 and we have for low values of h’, n» = 2/h. A plot of 


V (6) Sgainst n is shown in Figure 1 for 7 = 1,000, h? = 0.25 and 
a = 1. It will be seen that it is better to err on the side of having too 
large a family size. 
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v (8) 


FAMILY SIZE 


FIGURE I 


THE SAMPLING VARIANCE OF THE REGRESSION OF HALF-SIB OFFSPRING ON SINGLE 
PARENT. 7' = 1000, a = 1, h? = 0.25 


(b) each measured parent has only one mate, so that the offspring 
are now full sibs. 


The correlation between parental measurement and true group mean 
is now h/+/2, so that we have 


V(b) = {[o5(1 — $h?)/2] + — + a)] — 3} 


= {4h*(1 — $h’) + [(1 — 4h’) /n}}/{[T/@ + @)] — 3} 
giving 
ng = a/$h’. 

The optimum size of group is now reduced for low heritabilities by 
a factor of 1/2 and does not now rise as h’ approaches one. However 
large the family size, there is still some variation of family mean about 
the regression line and the optimum 7 values are smaller than in the 
previous case. 

(c) all measured parents have the same mate (e.g. in an intra- 
sire daughter-dam regression). 

The groups of offspring are then full-sibs within a half-sib family 
and the correlation within sires of parental measurement and true 


. family mean is again h. Then 


V(b) = — h®)/4] + — + a)] — 3} 
= — h*) + — + — 3} 
giving 
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h’) 

The optimum value of n is thus close to that in case (a) except when 
h’ is high, when it is a trifle less. If more than one sire is used, the 
somewhat more complex case of allocation of progeny to both sire and 
dam is determined primarily by practical considerations of how many 
dams can be mated to each sire and how many offspring can be obtained 
from each dam. In theory the best structure would be to have only 
one sire but this would raise questions about the generality of such a 
regression coefficient. One might therefore lay down a priori that at 
least S, sires shall be used. The general argument as to the best struc- 
ture is involved but can be summarized briefly as follows:— 

(i) The ideal structure would have S, sires with the optimum dam 
family size determined as above. 

(ii) If the possible dam family size is below the optimum, it should 
be made as large as possible, with S, sires. 

(iii) If the number of dams that can be mated to each sire is below 
the optimum, then the maximum should be accepted and the number 
of sires used increased so that the dam family size is optimum. 


REGRESSION OF OFFSPRING ON MIDPARENT VALUE 


Each family is now a full sib group and the correlation between 
midparent value and true family mean is h. We then have 


V(b) = — h*)/2] + — [T/(m + a)] — 3} 
= — h’) + [(1 — + @)} — 3}, 
which has as its first approxi.nation to the optimum value 


fi — 3h’) 
— bh’) 
If we are concerned solely with the effort of measurement than a = 2, 


and the optimum family size is exactly the same as in the intra-sire 
daughter-dam regression method. 


n 


THE INCREASE IN INFORMATION AT THE OPTIMUM 


_ If T is large, it was shown that, in the general terminology used 
earlier, the optimum is given by nj = ay/8. We may contrast the 
variance at the optimum with that when one offspring is measured 
per family. We have then for the relative efficiency; at the optimum, 


E = (@ + 1)(6 + y)/[a8 + + 2(e6y)'””) 
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This can be reduced without undue error to (a + 1)/[1 + (2a8)'”’]. 
The gain in efficiency will then be greatest when h’ is low and will never 
fall below 30 percent, for the single parent methods, or 40 percent 
for the regression on mid-parent, if measurement alone is the effort 
required. 

DISCUSSION 


Any analysis in which a parent and a group of offspring are measured 
can furnish information on heritability both from the regression of 
offspring on parent and from an analysis of variance within and between 
offspring groups provided that the parents are chosen at random. It 
was shown in an earlier paper that the optimum structure for the 
latter determination has a family size which is equal to the reciprocal 
of the intra-family correlation coefficient, i.e. 4/h? for half-sib and 
2/h’ for full-sib families. It was then assumed that the effort was 
proportional to the number of progeny measured. It will be seen 
that these values are of the order of the square of the optimum values 
for the regression method. However, in view of the sharp rise in the’ 
sampling error of the estimate from the analysis of variance as the 
family size declines, it would be wise to aim at a group towards the 
larger figure if possible. 

It may happen that there is an environmental cause of similarity 
between members of a family group. This will, of course, make worth- 
less the heritability estimate obtained from the analysis of variance, 
but not that obtained by the regression method. However, this will 
necessarily increase the variance between family groups and the value 
of 8 in the general formulation of the problem. The optimum family 
size will consequently be reduced. 

In the paper on the optimum structure in the estimation of herita- 
bility by the analysis of variance, it was shown that, for two characters 
of similar, heritability, the optimum structure for the estimation of 
the genetic correlation between the two was the same as for the estima- 
tion of heritability. In the case of parent-offspring regressions, Reeve 
[1955] has developed the necessary formulae. These simplify when 
the phenotypic and genetic correlations are the same to 


for the single-parent half sib case. The optimum value of n will be 


given by 
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which will give similar values to that for the heritability estimate unless 
h’ is large. 


SUMMARY 


The optimum family size is discussed in the determination of the 
heritability by offspring-parent regression methods. The optimum 
size is smaller than that for the determination of the heritability from the 
analysis of variance, being inversely proportional to the square root of the 
heritability. As with the latter method, the best design for the measure- 
ment of the genetic correlation between two characters is almost identical 
with that for the determination of their heritability. 
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THE ANALYSIS OF A CATCH CURVE 


D. G. Cuapman! ann D. S. Rosson 
University of Washington and Cornell University, 
Seattle, Washington and Ithaca, New York, U.S. A. 


1. Summary 


The age distribution of a random sample, or the “catch curve’’ 
from a stationary animal or fish population, provides information on 
the annual survival rate of the population. The usual assumptions 
imply that the age distribution in such a population is geometric. The 
best (minimum variance unbiased) estimate of the annual survival rate 
is X[{1 + X — (1/n)]"', X being the mean age and n the sample size. 
This estimate is compared to the so-called “Jackson” estimate and 
some modifications of Jackson’s estimate. Assumptions basic to re- 
gression estimates of the survival rate are considered. 

A test of the model is given for the particular case where it is 
suspected that there may be selection against younger ages. With 
the geometric age distribution there is no unbiased estimate of 7, the 
instantaneous mortality rate, but a nearly unbiased estimate is given by 
i* = In{1 + — (1/n)] — X — [(n — 1)(n — 2)/n(t+ 1) (n+ — 1] 
where ¢ = nX. 


2. Introduction 


The use of “catch curves,’ or the frequency distribution of the 
catch by age, which goes back to Edser [1908] is now well known in 
fisheries work as a method of yielding information on the vital statistics 
of the fish population. The earlier users worked with age only in- 
directly, using instead some more easily available correlated variable, 
such as length. Improvements in aging techniques now make it pos- 
sible to work with age in almost all studies. A review of some of the 
methods used and of some of the problems are given by Ricker [1958, 
Chapter 2]. 

The aim of the present paper is to give an analytic treatment of 
such «catch curves and to study the statistical problems associated 


1Work done while at North Carolina State College, Raleigh, North Carolina, U. S. A. 
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with them. This seems desirable in view of the increasing importance 
being attached to accurate estimates of mortality rates in the study 
of the dynamics of populations. Furthermore, while catch curves of 
commercially fished populations may be based on samples so large that 
no statistical refinement is required, the research biologist. working 
with limited samples must obtain the maximum possible information 
from them. 

In fact, it is in the latter situation that the catch curve may still 
be important; the fisheries biologist working with a commercially 
exploited population tends to obtain mortality estimates through tagging 
experiments or by comparison of the catch per unit of effort in several 
successive annual catches. These have been discussed by Beverton 
and Holt [1957]. Yet tagging and catch per unit of effort estimates 
of mortality are based on assumptions that may be open to question 
so that it may be desirable to obtain mortality estimates by several 
methods for comparison purposes. 


3. Estimation of the constant annual survival rate. 


We assume that we are dealing with a population in which repro- 
duction occurs annually and which has reached a state of approximate 
equilibrium where the number of births is just sufficient to balance 
out the number of deaths at all ages during the previous year. Total 
population size measured on annual anniversaries therefore remains 
constant, and the population is said to be stationary. This pheno- 
menon has been observed, for example, in fish populations of inland 
waters. In many instances it has also been observed that, above 
some minimum age, the number of fish in each age group diminishes 
geometrically with age, implying that the annual survival rate is con- 
stant over age and time. 

It is also known that many types of fishing gear are selective against 
smaller sizes and hence against younger ages. We assume as known 
the age at which fish are fully available to the sampling gear. Some 
questions connected with this problem are discussed in Section 7. 

We assume then that there is some age X, , such that for all ages 
X = X,, the probability of selection is the same and the annual survi- 
val rate is the same. It is convenient to relabel ages so that X, = 0. 

The assumptions outlined may be subsumed in the statement that 


_the age frequency distribution in the population is given by the function 


f(x) = z=0,1,2,:-- (1) 


where s = annual survival rate, a = 1 — s = annual mortality rate. 
The age X of an individual randomly selected from this population 
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is therefore a random variable having the geometric probability distri- 
bution (1). We consider here the problem of estimating the annual 
survival rate s on the basis of n independent observations on the random 
variable X; that is, on the basis of the ages observed in a random sample 
of n individuals from this population. 

Since the random variables X, , --- , X, are independent, then 
their joint probability distribution is 


, 
from which it is readily seen that the sum 7’ = 7. X, is a sufficient 
statistic for the unknown parameter s. It is known that the distri- 
bution of 7’ has the form: 


It is further easily shown, by reference to the definition, that 7 
is a complete statistic. (Lehmann and Scheffe [1950)]). 

Existence of a complete sufficient statistic implies that, if s is esti- 
mable, then a uniformly minimum variance unbiased estimator of s 
exists, and this estimator is the unique function of 7’ which is an un- 
biased estimator of s. Such a function, say h(t), is uniquely defined 
by the identity 


or 


Since the coefficients of s‘ on both sides of this identity must be equal 


then 
ey 
t 
The statistic 
} §=T/nm+T — 1) (3) 


is therefore the unique minimum variance unbiased estimator of s. 
An alternative form for this is 
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+ — (1/)], (4) 


where, as usual, X = 7'/n is the sample mean. 

It is seen that the minimum variance unbiased estimator § has 
the same form when several samples are combined, whether from dif- 
ferent times of the year or different years, provided the distribution (1) 
holds. In this case X is the grand mean of all individuals sampled 
and n the combined sample size. 


Sampling variance of the estimator 


The exact variance of § is not expressible in closed form; however, 
an unbiased estimator of var (8) is easily constructed using the fact 
that 


— 1) ---(T—r +1) 


ter 


E 


The variance of § may thus be expressed 


var) = — 2 


showing that 
n+T-—2 
is an unbiased estimator of var (§). Moreover, by completeness of 7, 
this is the minimum variance unbiased estimator of var (8). 


An approximation to var (§) is obtained by noting that the maximum 
likelihood estimator of s is 


s* = T/n+ T) = X/(1 + X). (6) 
The asymptotic variance of Vn (s* — s) is s(1 — 8°); 
var (8) = [s(1 — s)*]/n (7) 


is therefore an approximation to var (8) for large n. 


4. Alternative (Poisson) Model 


We have assumed that the population has the geometric age dis- 
tribution given by (1). But it may be asserted that the population 
at any given time instant is the result of a random birth and death 
process and that this has not been taken into account in writing down (1). 
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Let Nx = number of individuals age X in population. We might 
alternatively assume: 

(a) Given Nx-, , Nx is binomially distributed with parameters 
(Nx_, , 8) ie. every individual has a probability s of surviving from 
year to year and survival of different individuals are independent 
events. It then follows by easy induction that 

(a’) Nx is binomially distributed with parameters (N, , s‘) where 
N, , the size of the zero group, is a fixed parameter. 

(b) The random variables Nx are independent. 

(c) The probability of catching any individual is \ and the event 
of catching one individual is independent of catching any other indi- 
vidual. 

Denote by n, , the number of individuals in the sample of age X. 
From (a’), (b), (c) it follows that n, is binomially distributed with 
parameters (N, , As’). Since \ is small, it is reasonable to assume that 
the distribution of n, is adequately approximated by a Poisson distri- 
bution with parameter N,As’. 

Writing (NoA) = u we obtain the distribution function of n. , n, , 


Ny . as 
= 


z=0 


With this model there is a positive probability thatn = )-°_, n, = 0, 
so that no observations are made and no estimation is possible. It 
is therefore necessary to work with the conditional distribution of 
the nx’s given n. It is well known that the n,’s conditionally follow 
a multinomial distribution and hence 


z=0 


Again it is seen that >> an, = T is a complete sufficient statistic 
for s and that 7’ has the distribution (2). Thus this model leads to a 
result identical to that given in Section 3. 


5. The “Jackson” estimate 
Jackson [1939] has given the following estimate for s 
8’ = (n, + mo + +m + (10) 


where k represents the oldest age group occurring in the sample and 
thus is a random variable. This estimator is widely used in practice 
though it is not well defined and can also take on values greater than 1. 
A modified form of this estimator, which was given by Heincke [1913], 
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8 = (m+n +m)/Mm +m + +m) (11) 

meets both of these objections and is in fact an unbiased estimator of s. 

It is seen immediately that n. , the number of age 0 individuals 

in the sample, is binomially distributed with parameters (n, 1 — s) 
so that E(s,) = s and 

var (85) = s(1 — s)/n. (12) 

As general theory requires, this is larger than var (§). We recall 
for large n, var (§) = s(1 — s)*/n, which will be much smaller than 
var (sj) if the mortality rate is substantial. 

The numerical value of sf will usually not differ greatly from s’ 
since n, , the number of individuals in the oldest age group k, will 
be small. 

Jackson also suggested another estimate similar to s’, namely, 


s? = + ma + +m + (13) 


This again can be modified slightly to give an estimate that is more 
tractable mathematically and does not have the undesirable properties 
that s® shares with s. In fact this modification has been used (Hylen 
et al. [1955}). 

It may be worthwhile to note that there is a general class of such 
modified estimators; viz., 


[(n, + Mu + n,)/n]'”” (14) 
= [(n — m — — — 


Since }-?=} n,; is a binomially distributed random variable with 
parameters [n, (1 — s) >-!=? s‘] or (n, 1 — 8”), then (n — >-?-} n,)/n 
is an unbiased estimate of s? with variance s’(1 — s”)/n. 


Write s” = @ and denote by 6 these estimates; i.e., 
= (n — n)/n, 
so that = 6”. 
We expand 6” in the Taylor series around 9; i.e., 


+ [(1 — — 6)? + Remainder. 
From this it follows that to terms of order 1/n 


Var = (1). (16) 
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This variance, considered as a function of p, is minimized for that 
value of p, say p,,(s) which satisfies the equation 1 — s? = } | logs | p. 

If this were to be used as a guide in practice, assuming some prior 
information on the general magnitude of s, p would have to be chosen 
as one of the integers near p,,(s). It may be shown that 7,,(s) is an 
increasing function of s; i.e., if variance alone is the criterion, then, 
when s is small, s} is preferred among this class while, when s is large, 
p should be chosen fairly large. 

In particular, for s = 0.5, p,,(s) lies between 2 and 3. We further 
note (again to order 1/‘n) Bias s\?? = —(1 — s*)/4s, which is negligible 
for s near 1 but may not be otherwise. Also Var s\”) < Var (s/) provided 
s> }. 

Insofar as s;”) and s™ are essentially equivalent, support is found 
for the statement made by Jackson [1939, p. 240] quoting W. L. Stevens 
in favor of s® over s’, a statement for which no proof has been given. 
It should be reiterated though that, if the geometric distribution (1) 
is accepted as the correct model, then § = X/[1 + X — (1/n)] is 
preferable to either. 


6. Estimation of the survival rate from truncated samples 


It has been tacitly assumed that there is no upper age limit in the 
population though there will be in the sample. There are situations 
however when it is known that the sampling gear is selective against 
older ages. There are also situations where the assumption of a con- 
stant survival rate is valid only for a limited number of age classes. 
In these cases the age distribution (1) is effectively truncated. 

Let k denote the maximum age on which the estimation of s is to be 
based. Any individuals older than k are disregarded. This maximum 
age is determined in advance and not from the sample. 

The age distribution of the portion of the population from 0 to k 
has the form 


fia) = z=0,1,2,---,k. (17) 


From general theory it is known that 7 = >-*_, X, is still a suffi- 
cient statistic for s, based on a sample of n observations from (17). 
The distribution of 7’ obtained by equating coefficients of like powers 
of z in the identity 


nk + (zs (1 s)” 


is given by 
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n-1 


where [¢/(kK — 1)] is the largest integer contained in t/(k — 1). 

Completeness of T can also be proven. This is not especially helpful, 
however, because it is no longer possible to construct an unbiased 
estimator of s. If there were some unbiased estimator of s, there would 
be an unbiased estimator which is a function of 7 alone, say A(T). 
This function h(T) would satisfy the identity (in s) 


(t/(k+1)] 


This identity cannot hold because the coefficient of s"“*”*' is 0 on the 
left-hand side (the largest power of s on this side is n + [nk/(k + 1)] < 
2n < n(k + 1) +1 fork > 1) while the coefficient of s"“*” *' is non- 
zero on the right-hand side. 

The maximum likelihood estimator of s is easily derived. Since 


k 
log L = T logs — n log (>> s'), 
t=0 


the maximum likelihood estimator s* is the solution of the equation 


k k 
X=Tm=s ts''/>s8' = Z,(9). (18) 
t=1 t=0 
To facilitate solution of this equation, Z,(s) has been tabulated in 
Table 1 for s = 0.05(.05)1.00 and for k = 2(1)7. If a more accurate 
solution than can be obtained from this table is required, it will usually 
be convenient to work with (18) in the slightly different form 


X = [s/(1 — 8)] — & + — (19) 


The solution can be obtained from (19) fairly rapidly’ by iteration. 
It is seen from Table 1, as may be verified algebraically, that Z,(s) 
is an increasing functiérf of s in the interval 0 < s < 1. There is there- 
fore at most one solution for s. It is possible there is no solution with 
s < 1; ie., the likelihood function is maximized for s > 1. In this 
case s = 1 is taken as the maximum likelihood estimate. 
Further, it may be computed that 


*For k = 1 the solution is s* = X/(1 — X) and fork = 2, s* =[—(1 — X) + V1 4 6X — 3k4/ 
— 2)}. 
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TABLE 1 


an 
ll 


k=3 k=4 k=5 k =6 k=7 


0.052256 | 0.052607 | 0.052630 | 0.052632 | 0.052632 | 0.052632 
0.108108 | 0.110711 | 0.111061 | 0.111105 | 0.111110 | 0.111111 
0.166311 | 0.174445 | 0.176091 | 0.176398 | 0.176459 | 0.176468 
0.225866 | 0.243590 | 0.248399 | 0.249616 | 0.249910 | 0.249980 
0.285714 | 0.317647 | 0.328446 | 0.331867 | 0.339206 | 0.333211 
0.345324 | 0.395907 | 0.416392 | 0.424194 | 0.427040 | 0.428046 
0.404075 | 0.477522 | 0.512062 | 0.527411 | 0.533955 | 0.536659 
0.461538 | 0.561576 | 0.614937 | 0.641990 | 0.655179 | 0.661421 
0.417398 | 0.647143 | 0.724183 | 0.767942 | 0.791927 | 0.804707 
0.571429 | 0.733333 | 0.838710 | 0.904762 | 0.944882 | 0.968628 
: 0.819330 | 0.957244 | 1.051410 | 1.114003 | 1.154670 
0.673469 | 0.904412 .078418 | 1.206364 | 1.298401 | 1.363335 
.200848 | 1.367718 | 1.496297 | 1.593837 
.323212 | 1.533318 | 1.705117 | 1.843937 
.444302 | 1.700920 | 1.921673 | 2.109996 
2.142434 | 2.387248 
.678625 | 2.033536 | 2.363805 | 2.670247 
. 790286 | 2.194782 | 2.582407 | 2.953399 
.897530 | 2.350637 | 2.795275 | 3.231475 
.0 2.5 3.0 3.5 


: 

8 


o 
to 
© 


(Aaymptotic) Var (@) = 1 


As is to be expected, the more ages included, the smaller the variance 
of s* for fixed n. 

To evaluate the modified Jackson estimator, or what should be 
properly called Heincke’s estimate, for the truncated situation, we note 
that n — n, is now binomial with parameters [n, s(1 — s)*/(1 — s**')] 
so that 
E(s;) = E[(n — n)/n) 


Var (8s) = [s(1 — s)/n]-[(1 — s*)/(1 — s**")?]. 


In this situation s, is biased though in most cases the bias will 
be small. 


7. Tests of the model 
We return now to consideration of the model given by equation (1). 
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As soon as 8 is estimated, the usual x’-test can be applied to test the 
conformity of the sample data to the assumed model. However, as 
pointed out earlier, sampling gear is frequently selective against younger 
ages and it is necessary to circumvent this difficulty by using in the 
estimation procedure only the sample data from the older age groups 
which are fully vulnerable to the method of capture employed. There 
may be, however, no way of knowing where this dividing line between 
complete and incomplete vulnerability falls, so that the data of the 
youngest age group used in estimation of the survival rate are suspect. 
We propose now to construct a procedure for testing the validity of 
the geometric model with particular emphasis on the zero-class. 

For this purpose the statistic sf = (mn — n.)/n, the modified form 
of the Jackson estimate, proves to be useful. The distribution of s/ 
as observed earlier depends on the unknown parameter s; however, 
the sufficiency of T = )-*., X, implies that the conditional distri- 
bution of sj for a given T is independent of s. 

We compute this conditional distribution by first noting that the 
joint conditional distribution of X, , --- , X, for a given T is 


= 


1 
if 


(22) 


0 otherwise; 


ie., every ordered set z, , --- , x, of non-negative integers such that 
>3., z; = ¢ is equally likely. The number of ordered sets containing 
exactly mn) zeros, n, ones, n, twos and n, ?’s is simply a multinomial 
coefficient, so 


n! 1 
flo | Dein, = = iy (23) 
t 
Verification that (23) is a probability distribution is obtained by 
equating the coefficients of s‘ on the left- and right-hand sides of 
The conditional distribution of n, is then 


t 


where indicates the sum is over the set n, ,---,, such that = 


: 
n (n — nm)! 1 pare: 
fm | Din, = ) = (" (24) 
i 
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nN — M, da, im; = 1, and, as seen by the coefficients of s‘ on the 
left- and right-hand sides of (s + s? + s*+ 
the distribution (24) is the hypergeometric distribution 


An exact test of the validity of the model then consists of com- 
paring the observed statistic s{ with the critical values, say 8.92.5 and 
S.o7s , computed from the hypergeometric distribution (25). Because 
of the close approximation of the binomial to the hypergeometric 
distribution, however, such test procedures are ordinarily replaced in 
practice by the binomial test. This approximate test procedure would 
then consist of entering a table of binomial confidence intervals (or 
using the Clopper-Pearson charts) with ‘“‘sample size’ = T and “number 
of success” = T — n+ np; if the resulting confidence interval includes 
§, then the validity of the model is accepted, and otherwise it is rejected. 
‘or large samples, of course, this procedure may be replaced by the 
ormal test; for fixed 7’, the statistic 


me T | — — 1) 
n — — 2) 


= — §)/V var (so | T) 


has mean 0 and variance 1 and is asymptotically distributed as a 
standard normal deviate. The approach of the hypergeometric to the 
normal distribution is rapid, and, with the sample sizes commonly 
employed in this type of experiment, say n > 100, little is to be gained 
by using the more exact binomial or hypergeometric test procedures; 
furthermore, the availability of extensive tables for the normal dis- 
tribution permits experimenters to make one-tailed tests of the hypoth- 
esis of a deficit in the zero-class. 

Theoretically, a test of the model based on the zero-class frequency 
is still available for the truncated case since the distribution of mp , 
given the sufficient statistic T,; does not depend upon s. This con- 
ditional distribution can be obtained by methods similar to those used 
above, but it assumes an awkward form and has not been tabulated. 
We therefore omit it here. 


(26) 


8. Regression and other estimates 


The earliest method of analysing catch curves was to graph them 
on semi-logarithmic paper. Such purely graphical techniques led 
eventually to fitting a regression of the logarithm of the catch of age X 
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on age by least squares, and hence a regression estimator of the survi- 
val rate. 


Though not explicitly formulated, the assumptions underlying this 
method are that 


E (log n.) = K + z (log s) (27) 


where K is an unknown parameter, the other symbols being as specified 
earlier. Further, it is assumed that var (log n,) is independent of z. 
It is convenient to use natural logarithms in this transformation 
because log, s = i, the instantaneous total mortality rate. In many 
problems 7 may be of interest rather than s and, as Gulland [1955] 
has pointed out, unbiased estimates of s do not convert directly to 
unbiased estimates of 7. y 
Concerning the second of these assumptions, there is considerable’ 
empirical evidence to suggest that haul data, when transformed by a 
logarithmic transformation, has approximately a constant variance ~ 
(Jones [1954], Windsor and Clark [1940]). 
It is also reasonable to assume that 


E(n,/N.) = (28) 


where we now follow the approach of Section 4. We relax the 
assumption however that N,.and s are constants and instead assume 
they are independent random variables with mean values Ny, 3. Fur- 
ther, let No — No = SN ands — 5 = 3s. 

The second assumption is that 


E(N.| No, 8.) = No (29) 
where s, , --- , 8, are the z realizations of the random variable s associ- 


ated with N, . 
Then by the usual Taylor series expansion 


E(log n, | No +++ 8) = log (ANS) + log 
— {o (.)/[E(@,)]*} + Remainder, 
log No = log No + log (1 log Ny — oY) 


| 
log 8; = [toes o( 8) | 


(30) 


If the variances of the random variables n, , No , s are small compared 
to their mean, then 
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E(log n.) = log» + X logs (31) 


where here » = (AN,) so that an unbiased estimate of log 3 is obtained. 
In particular, if n, has a Poisson distribution as postulated in 
Section 4, and, if N, , s are fixed parameters, 


E(log n.) = log + X logs — (1/us*) + o(1/ys*). (32) 


In this case E[1/(n, + 1)] = (1/us*) + o(1/us*)* and thus the approxi- 
mation would be improved if (27) were replaced by 


E{log n, — [1/(n. + 1)]} = K + x(logs), (33) 


that is, if the estimate of 7 is obtained from the slope of the regression 
of {log n, — [1/(n, + 1)]} on x. In either case it is clearly indicated 
that the approximation becomes doubtful when us” is small. 

It seems, therefore, that the sample should be truncated to eliminate 
age groups such that E(n,) is small. Any actual rule is, to a large 
extent, arbitrary and the problem is further complicated in that the 
application of it must depend on the random observations actually 
obtained. However, it would certainly seem desirable to eliminate tail 
values to the right of the point at which n, falls below 5. The retention 
of small n, in the estimation will bias the estimates of 7 positively. 

Actual use of the regression estimate should of course be limited 
to situations where it is doubtful that the stronger assumptions re- 
quired for (4) fail. While the assumptions noted in this section are 
weaker in that N, , s may be random variables, there is still the re- 
striction that they be independent. If size of different year classes 
or survival of the several year classes are interdependent random 
variables, then the process may be much more complicated. 

The regression estimate may be preferable when the age distri- 
bution in the population has the geometric form (1) but the sampling 
is not random. This may be the case if there is segregation by age 
or if ages are obtained indirectly through an age length key, for example. 
According to this common practice all fish are measured but ouly a 
subsample aged. 

Another estimate used in practice is the average of the ratios n,/n , 
N/M, , -** , M%/N,-, where k is the oldest age group in the sample or 
some preassigned maximum age for which data are used. This esti- 
mator, like s’, is not well-defined, but a slight modification such as 
adding 1 to each denominator will remedy this. Estimators of this 
form are used even in situations when the survival rate is known to 
vary with age. The ratio n,;/n,_, is, in fact, the maximum likelihood 
estimator of s; under the model - 
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f(x) = 8182 8,/(L + 8: + 8:82 + + 818283 &) (34) 


i.e., where s; is the proportion surviving from age 7 to agei + 1. Then 


1 fn, , 
+ ny 2] (35) 
is the maximum likelihood estimator of the average annual survival 
rate § = (8, + --- + 8,)/k. 
The vector (np , , , rather than 7 = in, is the suffi- 
cient statistic in this case. The multinomial distribution of this vector 
can be written down easily. Also, a rather odd modification of n;,,/n; , 


+ — = 0|T 1) (36) 
can be shown to be an unbiased estimator of s;. Thus 


if T>0, 
(37) 
=0 if T=0 
is an unbiased estimator of 3. 
We omit the proof of this. 


9. Estimation of instantaneous mortality rate, 1 


Since in many problems 7 is more useful than s, it is worthwhile to 
note that for the geometric model (1) no unbiased estimate of i = —log s 
is possible. This is easily seen by writing down the identity that must 
be satisfied; viz., 


+ — 8)" = —logs. 


As s tends to zero the left-hand side tends to h(0) while the right- 
hand side is unbounded. 

Let us define t = —log § = log (n + T — 1) —log T, and expand 
i(s) in a Taylor series about s, 


so that to terms of order (1/n), Bias ¢ = (1 — s)?/ns, Var 7 = (1 — s)?/ns. 
The approximations here are due to the fact that var § is given 
only approximately by s(1 — s)?/n, though it will certainly be an 
adequate approximation for the sample sizes found in practice. 
If the bias of ¢ is regarded as important, one can attempt to correct 
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for it, up to terms of order (1/n), by setting 7* = —log § — g(t) when 
to terms of order (1/n) E[g(é)] = (1 — s)’/ns. 

Unfortunately, (1 — s)*/s is also not estimable but it can be shown 
that, if we choose g(t) = (n — 1)(n — 2)/n(n + ¢ — 1)(¢ + 1), then 
Elg(t) — (1 — s)?/ns] = —1/n{(1 — s)"/s]. The right-hand side of 
this expression will be small for usual values of n and s. 
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MAXIMUM LIKELIHOOD ESTIMATION OF GENETIC 
COMPONENTS OF VARIATION 


B. I. HayMan 


Applied Mathematics Laboratory, D.S.I.R., 
Private Bag, Christchurch, New Zealand. 


INTRODUCTION 


The expectations of variances and covariances of metrical characters 
in genetical experiments may in many cases be expressed (Mather [1949! 
as a sum of two genetic components, D measuring additive effects ani 
H measuring dominance effects, together with appropriate environment: 
components E, , EF, , etc. If we writec = (D, H, FE, , E,, ---) fo 
the row vector of components and v for the column vector of observe: 
variances and covariances then, in these experiments, 


év = Mc 


where M is a matrix of coefficients determined by the genetical structure 
and physical layout of the experiment. The least squares estimate 
for c is é = (M’M)~'*M’v where M’ is obtained from M by inter-changing 
its rows and columns. Further, if s’ is the residual variance after 
fitting the constants c, then the matrix (M’M)~'s’ contains the variances 
and covariances of the estimates ¢. (M’M)™ is the multiplier matrix 
of Mather [1949]. 

This method of estimation is correct when the observed variance~ 
and covariances are uncorrelated and have equal errors, but these 
conditions are seldom met. The sampling error of an estimated vari. 
ance varies with the magnitude of its expectation and inversely with 
its number of degrees of freedom. Indeed, in one experiment the errors 
may vary by as much as a factor of 10 so that the residual variance 
in the ordinary least squares analysis is too large for components 
estimated from the smaller variances and too small for components 
estimated from the larger variances. Mather [1949, p. 101] pointed 
out this difficulty and suggested that it could be overcome by a weighted 
least squares analysis. The purpose of this article is to investigate 
this weighted analysis and to compare it with other analyses. The 
illustrative genetical material is taken from a selfing experimen: 
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Nelder [1959] has also considered this problem from a different point 
of view. 


THEORY 


It may be possible to estimate the errors and the correlations of 
the various second-degree statistics from replicate values of v. Then, 
if V is the variance-covariance matrix of the errors of v, the least squares, 
estimate of the components is 


é = (M’V"M) 'M’'V''v. (1) 


(M’V-'"M)~ is the variance matrix of é and supplies the standard 
errors and correlations of the estimators of the components. The 
chi-square testing goodness of fit is — 

Alternatively, since metrical characters are often distributed nor- 
mally in families, V has a theoretical expectation in terms of the true 
value of v. Then the maximum likelihood equation for the estimate 
of c is again equation (1) except that V is to be calculated, not from 
the observed variances and covariances v, but from their expected 
values Mé. Thus equation (1) only gives é implicitly. The explicit 
solution is obtained iteratively by computing V initially from the 
observed value v, finding a first estimate é, , recomputing V from 
Mé, , finding a second estimate é, , and so on. 


THE EXPERIMENT 


The genetical structure of the experiment is as follows. Two inbred 
lines of Nicotiana rustica (lines 1 and 2 of Hayman’s [1958] Experi- 
ment 2) were crossed and one plant of the resulting F, family selfed 
to give an F, family of 25 plants. Ten F, plants were selfed to give 
F, families of 5 plants each. All the F; plants were selfed to give F, 
families of 5 plants each. A few plants failed to grow. Since inbred 
lines can be reproduced genetically unchanged generation after gen- 
eration, it is possible, by repeating the original cross each year, to grow 
all the generations simultaneously in the final year. The experimental 
material up to F, was grown in 1954 and in 1955, the P, F, , F, and Fs 
generations in 1954 being the respective parents of the P and F, , 
F, , F; and F, generations in 1955. 

In the physical layout, plants were grown in plots of 5 and the 
plots randomised in each block. Each block contained 5 plots of each 
parental line and of the reciprocal F, and F, families while each F; 
or F, family occupied one plot. Two such blocks made up the experi- 
ment in each year. ‘ 

The heights of the plants were measured in inches and the time 
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of opening of their first flowers was recorded to the nearest day. The 
second-degree statistics computed from these observations, together 
with their expectations, are: Variance within parental and F, plots, E, , 


SE, = E,. 
Variance between duplicate parental and F, plot means, EF, , 
&E, = FE, + 
Variance within F, plots, V,r2 , 
= 3D+7H+E,. 


Variance of F; family means, , 


Vin 


Mean variance within F; families, V2r3 , 
EVors = 4D + 4H + E. . 
Variance of means ofjgroups of F, families descended from one F, 
plant, 
1 
Mean variance of means of F, families within groups, V2,, , 


Mean variance of F, families, Vr. , 
1 1 
= GH +E. : 


These variances were computed for both 1954 and 1955, the former 
having expectations as above and the latter having similar expectations 
in terms of D’, H’, EZ and E{ . The covariances between the two 
years were also computed. These with their expectations are: 


Covariance between F, individuals in 1954 and their F; family means 
in 1955, 


EW i ros 4D” + 


Covariance between F; family means in 1954 and the means of groups 
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of corresponding F’, families in 1955, Wi rs. , 
EWira = 5 D + 354 + 


Covariance between F; individuals within F, families in 1954 and their 
F, family means in 1955, Wars. , 


1 1 
EWarss 4 + 16 


The expectations are taken from Mather and Vines [1952], with 
the addition of the bracketed terms to allow for sampling (Nedler, 
[1953]). When the families contain as few (5) plants as in our experi- 
ment, this correction is important, especially to the coefficient of H 
which the correction, whenever it occurs, increases by at least 40%. 
E, and E, are the components of environmental variances within and 
between plots. These were estimated from parental and F, families 
which all gave similar values and were averaged. D = 2d2, D’ = 2d? 
and D” = = d, di are the additive genetic components, d, being half 
the difference in effect between genotypes AA and aa in 1954 and d/ 
the corresponding difference in 1955. H, H’ and H”’ are the dominance 
components with similar definitions in terms of the dominance devia- 
tions, h, and h, . In the absence of genotype-environment interaction 
over the two years D = D’ = D’ and H = H’ = H” and all the second- 
degree statistics can be expressed in terms of the one pair of genetic 
components D and H together with appropriate environmental com- 
ponents. Any interaction of the genotype with the environment other 
than a proportional change in effect reduces D’’ below the geometric 
mean of D and D’ and H” below the geometric mean of H and H’. 


RESULTS AND DISCUSSION 


We shall make three investigations. The first is a comparison of 
our method with two other methods, illustrated by the 1955 height 
data. The second is of a design suitable for unweighted least squares 
analysis. The third is the use of our method to study genotype- 
environment interaction in flowering time, again followed by a design 
suitable for unweighted least squares analysis. 


1. Comparison of methods. 


In this investigation we use the 1955 height data. The components 
are estimated from the variances in that year so that the variance 
matrix V takes the simple diagonal form. Since V,,. , for example, 
does not involve the actual parents of the families involved in V,,; , 
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the correlation between them which would appear in sampling from 
a bivariate normal population does not occur here. Nelder [1959] 
finds that genetic correlations between them are small enough to be 
neglected. 

The maximum likelihood analysis therefore reduces to iterative 
weighted least squares. The above equations for the 1955 variances 
are initially divided by v(2/f)'”’, where v is the variance concerned 
and f its degrees of freedom, and then D, H, L’,, and E, estimated by 
ordinary least squares. Expected values 6 of the variances are then 
computed, each equation is redivided by #(2/f)'” and D, H, E,, and 
E, estimated again. The iterations are continued until the chi-square 
testing goodness of fit of the components D, H, F,, and F, reaches 
a stable minimum. This chi-square is computed after each iteration 
as the sum of squares of the deviations of observed variances from 
expected variances, each square being weighted with f/26”. 

Table Ia cqntains in its second column the observed variances 
averaged over blocks and reciprocals (Z, and LF, have been averaged 
over parental and F, families as well). The degrees of freedom are 
in the third column. 


TABLE Ia 


HerGut In A Nicotiana rustica SELFING EXPERIMENT IN 1955. 
VARIANCE EsTIMATES, EXPECTATIONS AND WEIGHTS 


Least 
Degrees squares 
Vari- of Expectations from M.L. iterations | expecta- | Final 
ance | Value | Freedom 1 2 3 4 5 tions weight 
Vir2 | 69.29 80 64.49 64.87 65.07 65.05 65.07 | 63.82 . .10 


Virs | 43.12 36 61.80 68.37 67.88 67.95 68.16 | 61.12 | .06 
Virs | 36.66 160 38.88 39.04 39.11 39.11 39.12 | 39.39 23 
Virs | 67.84 36 48.26 57.79 56.68 56.84 57.11 | 57.39 07 
Vare | 41.29 153 37.84 40.76 40.67 40.68 40.79 | 33.52 21 
Vars | 26.47 770 26.07 26.12 26.14 26.14 26.14] 27.18 75 
E, | 12.95 160 13.27 13.20 13.16 13.17 13.16 | 14.97 68 
E, | 14.06 32 13.88 13.16 13.47 13.42 13.42 5.92 30 


Table Ib contains the estimates and the standard errors of D, 
H, E, and E, from five cycles of iteration. The expectations of the 
variances after each cycle are in Table Ia. The fluctuations of the 
estimates with iteration are less than their standard errors and very 
much less indeed in the last three stages. The chi-square testing 
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TABLE Ib 
Heteut 1n A Nicotiana rustica SELFING EXPERIMENT IN 1955. 
EstTIMATEes OF COMPONENTS AND STANDARD ERRORS AND TEST OF 
GoopnEss oF Fit or Data To THEORY 


Estimates from M.L. iterations Least squares Estimates 

1 2 3 4 5 estimates (Nelder) 
D 79.98 99.01 96.65 96.99 97.51 101.61 101.61 
H 44.93 8.67 14.31 13.58 12.63 —7.83 —7.83 
Ey 13.27 13.20 13.16 13.17 13.16 14.97 14.97 
Ey 11.23 10.52 10.84 10.79 10.79 2.93 2.93 
8p 20.04 17.40 19.36 19.23 19.31 16.00 17.91 
Sz 50.64 45.08 48.51 48.24 48.37 59.83 79.13 
Sre 1.43 1.35 1.35 1.35 1.35 6.36 1.25 
Sx 3.65 3.07 3.01 3.07 3.06 5.94 1.68 
x? 5.87 3.67 3.71 3.56 3.68 


goodness of fit stabilizes at the second iteration so that two iterations 
are enough. 


Unweighted least squares estimates and errors are also included 


in Table Ib. There is agreement with the previous results for the 


genetic components but not for the environmental components. This 
is because the unweighted least squares analysis does not take account 
of the greater accuracy expected in the smaller variances, E, and F, . 

Finally, acting on a suggestion of Nelder [1953], unweighted least 
squares estimates were computed for each block and for each recipro- 
cal cross. The mean values of the four estimates of each parameter 
coincide, of course, with the previous single least squares estimates. 
The four estimates, however, provide a third and empirical estimate 
of the standard deviation which agrees reasonably with our original 
standard errors based on the assumed normality of the experimental 
measurements. 

It may be useful to add that estimating equations in which no 
allowance has been made for sampling variation all give smaller, and 
in fact negative, values of H. This was found also by Nelder [1953]. 
It is possible that some of Mather’s [1949] negative estimates of H 
would also disappear with this correction. These were not significantly 
negative and therefore of no real consequence but we wish to reassure 
some who feel that negative, even though non-significant, estimates 
of the intrinsically positive H in some way invalidate the D and H 
model of quantitative variation. 
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The implications of the experiment are clear. Heritability, esti- 
mated as 60 percent, is high and genetic differences in height between 
the two parents are important. The genes concerned are additive 
in action, the negligible value of H and the lack of significance of 
the chi-square showing the unimportance of dominance and epistasis. 


2. Experimental design. 


In this experiment we could have adjusted the experimental de- 
sign to reduce the need for a weighted estimation of the components. 
In the last column of Table Ia are the weights used in the final 
cycle of the iterative analysis. If these had been equal, the ordi- 
nary least squares analysis would have sufficed. The formula 
(f/26°)” for a weight depends on both the expected variance and 
its degrees of freedom. If we knew the variance estimates before- 
hand, we could arrange the numbers of degrees of freedom to almost 
equalize the weights. In practice, we are only likely to know the 
relative sizes of variances from segregating and non-segregating gen- 
erations, here about 3} : 1. We could redesign the experiment so that 
the degrees of freedom of variances within these two classes would 
be equal and so that the degrees of freedom of variances from segre- 
gating generations would be (33)’, or about 12, times the degrees of 
freedom within non-segregating generations. The variation between 
weights would then be only the variation between variances within 
the two classes. The easier unweighted analysis could then be used 
with much greater confidence than at present. 

Consider first the F; generation. If n families were descended 
from each reciprocal F, , and if each family contained two plants, 
then V,r3 would have 2n — 2, and V2»; would have 2n, degrees ot 
freedom. Similarly, the three /, variances would have about equal 
degrees of freedom if both plants in each F; family were selfed and 
one offspring of one, and two of the other, were grown. If the F, 
generation were needed, single offspring of all /', plants would be grown 
except that one member of each F, family pair would provide two 
offspring. An alternative and less complicated design in /’, would 
be to grow only two offspring of one member of each /, family but 
then no measure of V.,, would be available. 

In our experiment, plot size was family size in the F; and F’, gen 
erations and a submultiple of family size in other generations. In the 
new design, families would contain only two or one plant so that the 
plot unit would be abandoned and the plants randomised individually 
in each block. The environmental component of variance between 
plot means, FE, , would no longer exist. Thus F, would not be esti- 
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mated and E, would be deleted from the expectations of the variances 
leaving only E,, , the component of variance within genetically homo- 
geneous (parental or F,) groups of plants. 

The new design shifts the emphasis on the generations. The pro- 
portions of parental and F, , to F, , to F; , to F, plants in our experiment 
were 1 : 3: 1 : 5 while in the new design the proportions would be 
1 : 12 : 24: 36. In other words, the numbers of plants in one block 
of the experiment were 100, 50, 100 and 500 respectively in each gen- 
eration while the recommended numbers would be 10, 120, 240 and 
360 for about the same total number of plants. The ratio (1 : 12) 
between the first two recommended numbers is determined by the 
heritability in the experiment as shown above, and the ratios between 
the last three numbers (1 : 2 : 3) are fixed by the necessity to estimate 
1, 2 and 3 components of variance from the corresponding generations. 


3. Genotype-environment interaction. 


The third investigation concerns flowering time in the two years, 
1954 and 1955, of the experiment. It appeared from separate pre- 
liminary analyses of each year’s data that, while the additive genetic 
variation was similar in each year, dominance variation, present in 
1954, was absent in 1955 and that the environmental variation was 
less in 1955 than in 1954. 

Nineteen variances and covariances are available from which to 
estimate ten components of variation. These have been described 
earlier in this article. The variance matrix V determining the weights 
and correlations of the observed variances and covariances is no longer 
diagonal because parents and offspring occur together in the experi- 
ment and statistics involving them are correlated. There are three 
uncorrelated groups of three statistics each and ten further uncorrelated 
statistics. The three groups are (Vir2, Wires, Virs), (Vies, Wirse, Vira) 
and (Vsrs;, Wars. , Vir.) the plain variances arising from 1954 data, 
the pointed variances from 1955 data and the covariances from the 
joint data. 

To test for fit to our genetic model and for seasonal changes in 
the components, the «nalysis is carried through on four models: 


1. Ten components of variation fitted: D, H, FE, , FE, , D’, H’, 
‘EL, D", 

2. Eight components: D, H, D’, H’, D’, H", E, (=E), E, (=E)). 

3. Six components: D (=D’ =D”), H (=H' =H"), EL, EB, 
Es 

4. Four components: D, H, E, , E, . 
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The detailed mechanics of the fitting are available in statistical text- 
books and will not be given here. In any case we will later provide 
an alternative analysis which is simpler and reasonably accurate. 

Table IIa contains, for each iteration, the values of chi-square 
testing goodness of fit to each of these models together with some 
of the differences between these chi-square values. Once again two 
iterations seem to be enough. The most important chi-square is the 
first, testing for deviations from our most complex model. It is not 
quite significant at 5 percent so that there is a suggestion that epistasis 
or linkage might be disturbing our model. We can, however, proceed 
to test for heterogeneity of components over the two years. The test 
of overall heterogeneity is provided by the difference between chi- 
square from models 1 and 4. This is highly significant. Environ- 
mental heterogeneity is tested by comparing models 1 and 2 and genetic 
heterogeneity by comparing models 1 and 3. Since the genetic com- 
ponents, D, H, etc., are not orthogonal to the environmental com- 
ponents, F, , E, , etc., in the estimation equations the chi-squares 
testing genetic and environmental heterogeneity do not sum to the 
overall heterogeneity chi-square. It appears that the seasonal change 
is due to changing environmental variability and not to change in 
the genetic components. 

Final estimates of the ten components of model 1 together with 
their standard errors and their correlations are presented in the first 
section of Table Ilb. The components of variation in the separate 
years are significant, at least at the 5 percent level, but of the two 


TABLE Ila 
FLOWERING TIME IN A Nicotiana rustica SELFING EXPERIMENT IN 1954 AND 
1955. Tests oF GoopNEss OF Fit or Data To THEORY AND TESTS 
oF HETEROGENEITY OF COMPONENTS 


Degrees | 5 per 
Iterative chi-squares of cent 
1 2 3 4 Freedom | value 


22.20 16.36 16.06 15.98 § 
66.63 39.42 39.13 38.99 il 
32.20 23.41 23.17 23.11 i3 
430.68 191.74 191.18 191.24 15 


Model 


Overall 408.48 175.39 175.12 175.26 
Heterogeneity Environmental | 44.43 23.06 23.07 23.01 
Genetic 10:00 7.05 7.12 7.38 
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components of genetic covariation only D” approaches significance 
(probability 7 percent). The magnitude of the additive variation does 
not change over the years and the value of D” suggests that the same 
genes are controlling flowering-time in each year. Dominance varia- 
tion decreases from significant over-dominance in 1955, although we 
have seen that, taken together with the stability of additive variation, 
this decrease is not great enough to give any marked overall change 
in genetic variation. If the very small value of H’” were an accurate 
measure, so that we could infer 2h,hi = 0, we would conclude not 
only that dominance decreased from 1954 to 1955 but that it was 
unrelated in sign and magnitude in the two years. However, the 
large standard error of H” in our case precludes any such inference. 
The environmental variation decreases to about half in 1955 and 
heritability, measured by D/(D + E,), increases from 54 percent 
to 68 percent. 

Since genetic heterogeneity is not significant on the whole it is 
appropriate to obtain estimates from our model 3 also. These are 
in the second section of Table IIb, the common estimates of additive 
and dominance variation being tabulated under D’” and H”. All 
components are now highly significant. Dominance seems to be com- 
plete with no real suggestion of over-dominance, (2 — D)/s d = 1.48. 
Environmental variation drops by rather more than half from one 
year to the next and heritability rises accordingly from 43 percent 
to 69 percent. 

The model of genotype-environment interaction in this investiga- 
tion may be compared with that of Mather and Jones [1958]. They 


described the general influence of these interactions on the variances 


and covariances of segregating generations. We use a very much 
more restricted model in which the interactions are assumed to be 
negligible in any one year but may be important in the comparatively 
larger environmental change from one year to the next. 

We can redesign this experiment to suit the unweighted least squares 
analysis. If the generation sizes are adjusted according to the plan 
in the previous section, components D, H, E, , E, and D’, H’, E{ , E; 
can be obtained by separate least squares analyses of each year’s vari- 
ances and the two components of genetic covariation, D’ and H”, 
from a similar analysis of the covariances between the two years. The 
main weakness in such an analysis would appear to be the neglect 
of the sampling covariances between statistics involving related indi- 
viduals but we shall see that this is not important. 

If the two-year experiment is analysed as it stands by ordinary 
least squares, we obtain the estimates and errors in the third section 
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of Table IIb. Once again the errors of the environmental components 
are too large but there is a further discrepancy in the dominance com- 
ponents. Whereas the maximum likelihood analysis suggests a fall 
in dominance effect from 1954 to 1955, the least squares analysis pro- 
vides no evidence for this change and even suggests that dominance 
is not significant at all. However, the main point is that the two kinds 
of analysis give discordant results. Is the cause the incorrect weighting 
of the statistics (which can be remedied in part in the experimental 
design) or the neglect of the sampling covariances of the statistics? 

The answer lies in a maximum likelihood analysis with the sampling 
covariances omitted. In other words, the iterative analysis of our first 
investigation is applied separately to each year’s variances and to the 
covariances. In the analyses of the variances in 1954 and 1955 the 
weight for the equation in the variance »v is again (f/20°)””. In the 
analysis of the covariances the weight for the equation in the covariance 
w is [(00’ + 1°)/f]*” where # and 9’ are the corresponding expected 
variances in the other two analyses. The final section of Table IIb 
contains the resulis of these analyses after three iterations and it is 
clear that they agree with the results in the first, maximum likelihood, 
section and not with the results in the third, least squares, section. 
In this experiment, at least, the sampling covariances are unimportant 
while the sampling weights are most important. 

Our recommendation then is to design a two-year experiment so 
that each year follows the principles of our design section and all 
variances and covariances with genetic content have about equal degrees 
of freedom. Each year’s variances and the covariances can then be 
analysed separately for component estimates. If they are homogeneous, 
the three error mean squares may be pooled for error estimates. 

A three-year experiment would follow the same pattern with six 
separate least squares analyses, three of the variances in each year 


and three of the covariances between each pair of years. 


SUMMARY 


Maximum likelihood estimation of components of variation in gene- 
tical experiments is compared with Mather’s [1949] least squares 
analysis. The former is an iterative method and appears to require 
two cycles of computation in comparison with the single cycle of the 
latter. This extra labour is offset by the ability of the former method 
to handle a wide range of variance magnitudes. The latter method 
over-estimates the errors of components derived from the smaller vari- 
ances and supplies correspondingly inaccurate estimates of these com- 
ponents. The reverse situation holds for components estimated from 
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the larger variances. Nelder’s [1953] method supplies the same esti- 
mates of components as the least squares method but its error estimates 
agree with the maximum likelihood estimates when the observations 
are normally distributed. 

A new experimental design is suggested more suited to Mather’s 
[1949] easier least squares analysis. In the segregating generations all 
variances and covariances should have about equal numbers of degrees 
of freedom and the relative numbers of plants in segregating and non- 
segregating generations should be determined by the heritability. 
Families contain only one or two plants and randomization is by 
plants and not by family plots. 

The illustrations are drawn from a selfing experiment in Nicotiana 
rustica and include a discussion of genotype-environment interaction 
when the experiment is repeated for more than one year. 
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POLYCHOTOMOUS QUANTAL RESPONSE IN 
BIOLOGICAL ASSAY* 


Joun Gurtanp’, Inpox Lze*, Paut A. Daum? 
Towa State University, 
Ames, Iowa, U.S. A. 


1. Introduction 


Quantal response in biological assay is usually based on two possible 
outcomes. In fact, this response is often referred to as all or none, 
and a common example of the two possible outcomes is death or sur- 
vival. In a bioassay such as that of some insecticides based on mortality 
of the housefly, say, there are, however, three possible outcomes, namely 
alive, moribund, and dead. There is a definite order of these outcomes 
in that alive precedes moribund and moribund precedes dead. More 
generally, in other types of assays, many outcomes might be possible; 
for example, varying degrees of moribundity besides alive and dead. 

For the purpose of simplifying the presentation the present results 
are discussed in the context of insecticidal assays based on mortality 
and moribundity of the housefly. The extension to any number of 
outcomes s > 2 is immediately apparent, and the general formulas 
will be indicated in the Appendix. If one uses Normits (cf. Probits), 
Logits, or other monotonic transformations, a general method of ana- 
lyzing the data is elaborated which makes use explicitly of all the 
possible outcomes. Since this method uses more information, it is more 
efficient than the common procedure of pooling some outcomes (for 
example, moribund and dead) in order to make the response dichoto- 
mous. The estimates of the parameters used in the general method 
are obtained by minimizing appropriate chi-square expressions which 
are generalizations of x* (Normit) and x’ (logit) of Berkson [1], [2], 
[3] and have the same asymptotic properties of consistency and efficiency. 

An approach to the problem based on the method of maximum 
likelihood has recently been given by Aitchison and Silvey [10] and 
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Ashford [12]. Although maximum likelihood and minimum chi-square 
both yield asymptotically efficient estimators, the latter technique, 
which we present here, is simpler to apply. A comparison of the small 
sample properties of maximum likelihood and minimum chi-square 
estimators in the case of a dichotomous response is given by Berkson 


[1], [3]. 
2. Polychotomous quantal response. 


Suppose k groups consisting of n, , --- , m houseflies are exposed 
to dosages 1, , --- , 2 respectively. Out of the n, flies exposed at 
dosage x; suppose that at the given time of observation 

r;, are dead, r;, are moribund, r;, are alive. 


Write the observed proportions as 


Pa = D2 = Ti2/N; = = 1 — Pir — Di2- 
Let 


Pa = Epa, = Epie2 , =1—-Pa— Pe 


be the corresponding expected proportions or true probabilities. Let 
us first consider a normal distribution of tolerance dosages. Write ¢ 
as the cumulative normal distribution function. That is: 


Then 


Py = + Br,; 
+P, = + Bz;) 


where 
B=1/o, a? = j= 1,2. 


This assumes a normal tolerance distribution 9(u“”, o°) of lethal 
dosages and a normal tolerance distribution %(u", o°) of moribund 
dosages. Furthermore u‘”? > yw”. Since a fly becomes moribund 
before it dies, the expression in (2), which is the probability a fly is 
moribund or dead, must involve the same parameter 8 as in (1). If 
the 8 were not common, the two curves would cross, but this is obviously 
not permissible since P;, + Piz > P;, . It is also possible to justify 
(1) and (2) by means of the hypothesis of Hewlett and Plackett [4] 
that every quantal response is the result of an underlying graded 
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response reaching a certain level of intensity (c.f. Ashford [12]). The 
inconsistency introduced when the curves cross appears to have been 
overlooked by Aitchison and Silvey [10]. 

Denote the normit V corresponding to probability-P by 


V=¢'(P) 
where ¢"' is the inverse function corresponding to ¢. Let 
= ¢ (pi); = (pir + pia) 


be the empirical normits corresponding to p,, and (pj, + pia) re- 
spectively. 

To express the weights to be used in the minimum x’-procedure 
of estimation, we require the ordinates of the normal density curve 
corresponding to the empirical normits above. 

Let 


Then the weights w,,, to be used at the ith dosage z, are 


Wn = + » Vis = » = + +). (4) 
Define 

x’ (Normit) = Dd 


+ Wav” — a” — (5) 


Minimization of (5) with respect to a’, a”, 8 yields consistent asymp- 
totically normal efficient estimators 4‘, a”, 8. (See Appendix). 
These are expressible in matrix notation as 


6=C'y 


where 


j 
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k k 
1 1 


= + Wir2) + Wi22) + Wy22) 
and 
1 
= n (wir? + (8) 
(wear Wii2) + (Wire + Wi22)| 


It will be noted that instead of one Normit—dosage line, as in the 
case of a dichotomous response, there are now two parallel Normit— 
dosage lines 


a” + Bx j= 1, (9) 


corresponding to the outcomes dead and dead plus moribund respec- 
tively. 

An. over-all x’-test based on deviations from the estimated lines 
is afforded by the statistic 


k r 


which is asymptotically distributed as x° with (2k — 3) degrees of 
freedom. 

This is a more sensitive x’-test than in the case of a dichotomous 
response since the latter is based on only (k — 2) degrees of freedom. 

If a transformation other than normits is used, all the above formulae 
are modified by replacing the normits and the weights accordingly. 
Thus, if the underlying distribution of tolerance dosages is regarded 
as logistic, the probabilities P;; are given by 


Pa t+ Pa = [1+ 
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and the corresponding empirical logits by the inverse functions 
= log (pia + Pi2)/pis 
where 
Qi = 1— pi; 
The weights w,,, are now given by 


= + 2) 
and x” (Logit) by 


+ = = a” = Bzx;) 


(13) 
(14) 


(15) 


(16) 


The symbols a‘”, 8 appearing here should not be confused with 


those appearing in (5) in the case of x” (Normit). 
3. Generalized parallel line assay. 


If our purpose is to estimate the relative potency from an assay 
on Test and Standard Preparations, there will be four parallel lines 
involved if the response is trichotomous, the Test is a dilution of the 


Standard, and the response-dosage relation is linear. 


Let the two lines corresponding to the Standard Preparation in 
the case of dead and of dead-plus-moribund flies respectively have 


the equations 
vs as? + Bz, 


Q (2) 
=ag + Bz. 


(17) 
(18) 


Similarly, let the corresponding equations for the Test Preparation be 


qi) a 
vy’ = az + Bz, 


vy = a; + Bz. 


(19) 
(20) 


The logarithm of the relative potency of the Test with respect to 


the Standard Preparation is given by 


| 
| 
| 
; 


POLYCHOTOMOUS QUANTAL RESPONSE 387 


q) (2) (2) 


Since we may regard 
ar + a5” (22) 


(2) 


as expressing a; in terms of the other parameters, we consider the 
estimation of the four parameters in 


(23) 


The x’ (Normit) expression we shall use for this purpose is 
x" (Normit) = —as? — 


+ — as’ — — as? — Bra.) 

+ — as? — + — ap? — 

+ — ar — — + as? — as? — 
+ — ar + ag? — ag? — 


Minimization with respect to af? , a , af? , B yields consistent 
asymptotically normal efficient estimators 


6=C'y (25) 


(24) 


with C and y given as follows: 
Cis Ciz Cie | 
C = Cas 
Car Coe 
Can Ces 


Cy KH (26) 


+ Wrias), Cy = — Wriaa), 
= + Wiss), > N(Wriis + Weiss); 


Gs = p> nN + Weise), Can = n(Wrer + + 
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+ — + Wri22)]; 


(Wsire + Wsi22) + Ir(Wriie + Wri22)); 


“M> -M> 


k 
Cu = n[zs(Wsirr + + Wsi22) 
1 


+ + WwWrise + 


a) (2) (2) 
+ — — } 


a) (2) a) (2) 
N{WsiWss + + + } 


ni fore (Wren + + + Wri22)} ‘ (27) 


“Mr -M- 


+ + + Ws i22)} 


L + tri + + + 


The weights wsio, , Wrion are given by (4) with obvious modifications 
of the subscripts. 
The above estimators may be used to estimate u in (21) by 


(2 (2) 
= 


(28) 


A 


which has asymptotic variance 
Var (a) = 5 (Var (a? — a” 


— 2 Cov — a3”), + Var 


For the purpose of comparing a parallel line assay based on a 
trichotomous response with that based on a dichotomous response 
obtained by pooling outcomes we shall consider the ratio of asymptotic 
variances of comparable estimators of yu, the logarithm of the relative 
potency. Since the analysis of a dichotomous response is usually based 
on pooling the moribund with dead flies, we shall consider this case 
here. 

Suppose the lines’ 
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(2 q@ (2) (2) 
vs’ = as’ + Br, vr =ar + Br 


of (18), (20) respectively are estimated by minimum Normit x’ based 
on a dichotomous response (cf. Berkson [1], [2]), yielding estimators 


(30) 
say. Similar to (28), » may be estimated by 

M® = (31) 


which has asymptotic variance 
Var War (a? as’) 


— Cov — as”), + Var (b™)). (32) 
Consequently, the ratio 


Var 
Var (A) 


_ Var (ar? — ag”) + Var (b™) — Cov [(ar’ — as’), 
Var — + Var (8) — 2u Cov [(@r’ — a5”), 


(33) 


will provide a comparison of the methods using a dichotomous response 
obtained by pooling and a trichotomous response. 

An over-all x’-test, analogous to (10), for a parallel line assay based 
on a trichotomous response is given by 


k 
+ QWs v's. + Ws } 
(34) 
4 
+ + vr: + WriaWre y; 9; 


which is asymptotically distributed as x’ with (4k — 4) degrees of 
freedom. 

In addition to (or perhaps instead of) an over-all test such as above 
one might wish to test hypotheses such as equality of slopes for dead 
and moribund or such as equation (21). This can be accomplished 
by application of minimum x’-techniques (cf. Neyman [11], Gurland 
[7]) but is not included here for brevity. it is planned to submit further 
aspects of the problem of polychotomous quantal responses for sub- 
sequent publication. 
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4. Application 

The following biological assay data, Table 1, are typical of Guthion 
residue experiments carried out by Pfaeffle [5]. The Test Preparation 
is an extract prepared from alfalfa sprayed with a given amount of 
Guthion. The purpose of the bioassay is to estimate the amount of 
Guthion residue present in the Test Preparation. For comparison, 
the Standard Preparation of Guthion is made to contain a given amount 
of control extract (10 ml. in this case) in order that the masking effect 
due to plant lipids and other inactive substances present in the Test 
Preparation be the same for both Preparations. As is evident from 
Table 1, all doses of the Test Preparation contain the same total amount 
of plant extract (10 ml. in this case), one part due to the test extract 
itself, and one part due to the control extract added. The flies were 
observed at 17 hours of exposure to count the numbers alive, moribund, 
and dead respectively. 


TABLE 1 
Data For EXPERIMENT No. 1. DETERMINATION OF GUTHION RESIDUE. 


STANDARD PREPARATION 


Flies Moribund Dead 


Jar Dose exposed flies flies 
1 20 wg. + 10 ml. control extract 50 1 5 
2 35 wg. + 10 ml. control extract 50 1 21 
3 45 pg. + 10 ml. control extract 50 7 35 


TEST PREPARATION 


Flies Moribund Dead 
Jar Dose exposed flies flies 


1 1.0 ml. + 9.0 ml. control extract 50 
2 1.5 ml. + 8.5 ml. control extract 50 
3 2.0 ml. + 8.0 ml. control extract 50 


one 


12 
28 
36 


If we minimize the Normit x’ in (24), we obtain for the matrix C 


1063.00168  —1022.14362  —41.79567 —_59.00225 | 
= | —1022.14362 6.82813 |. 
— 41.79567 41.79567  90.32725 
| 59,0025 66.82813 14.77280 197.54862_ 


if 
| 
| | 
«| 
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Consequently 
.84991 84765 09061 —.54739 | 
C= 84765 84633 08996 — 54621 | 
-09061 .08996 .02109 — .05896 
|— 54739 —.54621 —.05896 35774. 
For the vector y in (27) .we obtain 
 —91.30644 | 
83.37266 
9.19128 
3.12024_| 
and the minimum  x’-estimate of 
6= as” becomes 6= Cy = 7.718 |, 
ay — .763 
5.016 | 
For the estimate f of the log relative potency we obtain 


If we pool the numbers of moribund and dead flies and apply an 
analysis based on a dichotomous response, we obtain the following 
minimum Normit x’-estimates of a2 , a , B respectively: 


as’? = —8.580, ary = —0.694, b® = 5.622. 


Thus, for the estimate M of u, we obtain M® = 1.403. On referring 
to (33) and estimating variances of regression parameter estimates by 
the usual weighted least-squares technique inherent in the use of 
minimum Normit x’, we obtain Var (M‘)/Var (@) = 1.14 which 
indicates a smaller variance if a trichotomous response is used. The 
x’-test of fit based on a trichotomous response gives, on applying (34), 
xis) = 12.69. The x’-test of fit based on a dichotomous response 
yields = 4.01. 

It will be noted both these x’-values are not significant at the 
5% level; however x‘,) is significant at the 10% level whereas x‘;, 
is not. The total x’-values with degrees of freedom 18 and 48 respec- 
tively, obtained by adding the individual x’-values in Table 3 appear 
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at the foot of the respective columns. Here, also, it is evident the 
Xias)-Values are significant at a much lower level than the x‘,,)-values. 
This reflects the greater sensitivity of the x’-tests based on larger 
numbers of degrees of freedom; however, one must be cautious in 
making such a conclusion since a more significant x’-value might appear 
because the model assumed could be deficient. 

It is well known that the logit transformation will usually yield 
results in agreement with those obtained using the normit transfor- 
mation. (cf. Finney [6]). In the case of a trichotomous response 
agreement between the two transformations is also to be expected. 
From the above experimental data we obtain the following estimates 
based on logits: 


p=1.409; M”® = 1.408 
for the ratio of asymptotic variances we obtain 
Var (M)/Var = 1.17 
and for the x’-test of fit we obtain 
Xie) = 11.58, = 3.75. 


These results are in reasonable agreement with those obtained using 
normits. 

The data of other bioassay experiments of Pfaeffle [5] were analyzed 
in a similar manner using the minimum x’-techniques evolved here. 
The results are summarized in Tables 2 and 3. In Table 2, the esti- 


TABLE 2 
EstIMATEs OF Loc RELATIVE POTENCY AND COMPARISON OF VARIANCES 
Normits Logits 
Var (M Var (M“)) 
Var (A) Var (2) 
1 1.407 1.403 1.14 1.409 1.408 Bij 
2 1.264 1.246 ee | 1.262 1.244 1.14 
3 1.341 1.331 1.18 1.342 1.332 1.19 
4 0.826 0.830 ee 0.819 0.831 2.18 
5 1.422 1.431 1.18 1.426 1.437 1.19 
6 0.873 0.876 1.14 0.867 0.879 HE 


f 
| 
| 
£ 


POLYCHOTOMOUS QUANTAL RESPONSE 393 


mates of » based on dichotomous and trichotomous responses are 
shown, as well as the ratio of asymptotic variances for comparison 
of the methods. It is evident that the estimates of u are in good agree- 
ment, whether one uses normits or logits; but the variance using a 
trichotomous response is less than the variance using a dichotomous 
response. In Table 3 are given the x’-values for the over-all test. The 
x‘s)-column refers to the dichotomous response based on pooling mori- 
bund with dead and the x%,)-column refers to the trichotomous re- 
sponse. Again it is evident that normits and logits lead to similar 
results, and that the test using a trichotomous response is more sensitive. 
In particular, it is apparent that the x%,)-values for experiment 5 are 
significant at the 5% level whereas the x‘;)-values for the same experi- 
ment are not. 

Before concluding this section some comment should be made on 
how to proceed in obtaining minimum ’-estimates of the parameters 
when zero proportions or 100% proportions occur, e.g. 0 killed, 100% 
killed, 0 moribund, etc. One possibility would be to use covariances 
involving unknown parameters rather than use consistent estimates 
in the quadratic form (45) (See Appendix); however the minimization 
would not in general lead to linear equations as in the case of empirical 
weights. Another possibility would be to estimate the normits or 
logits for such responses by first fitting a straight line to the remaining 
points and then extrapolating for the extreme doses. Such a technique 
is commonly used in probit analysis as the first stage in an iterative 
procedure for obtaining maximum likelihood estimates (cf. Finney [6]). 


TABLE 3 


x? VALUES FoR OVER ALL TEST 


Normits Logits 
Expt. xis) xis) xis) 
1 4.01 12.69 3.75 11.53 
2 0.09 9.04 0.15 11.68 
3 0.17 5.39 0.20 5.38 
4 2.37 4.46 1.20 3.39 
5 4.94 20.72 4.23 20.11 
6 5.63 7.78 5.15 7.66 
Total 16.01 60.08 14.68 59.75 
P(x?) .59 .13 .68 .14 
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A simple procedure which might be used in conjunction with minimum 
chi-square and empirical weights could be based on the same type 
of teasoning as in the (1/2n)-rule formulated by Berkson [3]. Such 
a rule is easily extended to a (1/4n)-rule to allow for trichotomous 
responses, and we have found this procedure to work quite well. For 
example, if there are 0 dead, 3 moribund, and 47 alive flies at a dosage, 
the responses could be adjusted to } dead, 3} moribund, and 46} alive. 
Some research seems desirable however to provide rigorous justification 
for such ad hoc procedures. In fact, research on the small sample 
properties of the x’-test as well as the corresponding estimates would 
also be desirable. 


5. CoNcLUSION 


If the response in a biological assay is polychotomous, it is more 
efficient to use this information explicitly in analyzing the data rather 
than pool certain outcomes in order to make the response dichotomous. 
A general procedure has been described for dealing with any number 
s (> 2) of outcomes, e.g. different degrees of moribundity, but for — 
simplicity the technique has been presented for a trichotomous response 
of houseflies. A general extension of Berkson’s minimum Normit and 
Logit chi-square techniques is indicated in the Appendix. 


APPENDIX 


MINIMUM ,x?-ESTIMATION OF PARAMETERS FOR 
POLYCHOTOMOUS RESPONSE 


Suppose there are s possible ordered outcomes O, , --- ,O,. That 
is to say, if O; occurs, then all outcomes O;,, , O;42. , --- , O, have 
occurred previously. For example take s = 3, O, = death, 0, = 
moribund, 0, = alive. For s > 3, the O,’s could represent different 
decreasing degrees of moribundity as 7 increases. For consistency of 
notation, the last outcome O, represents alive. 

Let P; denote the probability of outcome O; at a given dosage z. 

For simplicity of presentation consider first an underlying normal 
distribution of tolerance dosages and suppose the outcome O; has a 
normal tolerance dosage distribution , 

Then 


The property of orderedness of these outcomes requires that the 
(s — 1) normal distributions have a common variance o”. Thus, we 
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may write 
P, = P(O, tolerance dosage < x] = ¢(a’ + 82), 
P, + P, = tolerance dosage < z] = ¢(a”’ + 
P, + P2 + Ps = P[O, tolerance dosage < z] = ¢(a™ + £2), (36) 


P, + = P[O,_, tolerance dosage < x] = ¢(a“~” + £2) 


where 
a? = —y/o, B=1/o. (37) 
Note that P, = 1 — ) {=} P;. Write 
V; =a? + pz (38) 
and consider the transformation defined by 
P, = ¢(V)), 


P, + P, = ¢(V2), 
(39) 


+ P,.; = $(V,-1) 


which may in fact be regarded as a transformation from probabilities 
, to normite V,,--- , V.-1. 

Let p; denote the relative frequency estimate of P; , ie. the ratio 
of the number of outcomes O; to the number n of independent trials. 

Apply the transformation in (39) to the observed proportions 
Pi, and denote the new variables by , , to indicate 
empirical normits. From the multinomial distribution it is evident 
the covariance matrix of 


is given by 
P,Q, —P,P,; —P,P.-,; 
—P,-,P, —P,_,P; 
where 


=1-P;. 
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Let J be the matrix given by 


oP, oP, oP,-; 
OV, eV, OV, 
aP, 
Then the asymptotic covariance matrix of 
[v, eee v,-1] 
is given by 
JAJ’. (42) 
It can be shown the inverse J’'A~‘J~' of this covariance matrix 


is given by 


where, for convenience, we have written 


= V;). (44) 


Let V = [V, --- V,_,] be the true normits given by (38). 
From the results in Gurland [7] or Barankin and Gurland [8] it follows 
that minimization of the quadratic form 


@ — V)J’* — V)’ (45) 


with respect to the parameters a“, --- , a", 8 yields c.a.n.e. (con- 
sistent asymptotically normal efficient) estimators of these parameters. 
It is also shown op. cif. that it is possible to replace the parameters 
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appearing in the covariance matrix (42) by consistent asymptotically 
normal estimators and still obtain c.a.n.e. estimators of a”, --- 

a“~”, g. Thus, the V; may be replaced by the v; and the P; by - 
a the minimization of the modified quadratic form in (45) will still 
yield c.a.n.e. estimators. All these estimators are also Fisher con- 
sistent (cf. Rao [9]) as well as consistent in the probability sense. 

For the case of a bioassay with k dosages x, , --- , x, and three 

outcomes dead, moribund, and alive, the modified matrix (43) corre- 
sponding to the i" dosage z; would be 


(46) 
Wi12 Wi22 
with the weights w,,, given by (4). Consequently, minimization of 
the quadratic form (5) yields c.a.n.e. estimators of a‘, a’, B. 

If the tolerance dosages follow some distribution other than normal, 
with location and scale parameters yu, o respectively, the appropriate 
changes in the quadratic form (45) are immediately evident. Thus, 


for example, if the logistic replaces the normal distribution, we would 
use logits L,; in place of normits V; , where 


(47) 


and J; is the empirical logit. The form of the corresponding matrix 
(46) is then indicated by (15). 
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ESTIMATION OF REGRESSION SLOPE FROM TAIL 
REGIONS WITH SPECIAL REFERENCE TO 
THE VOLUME LINE 


M. L. Dupzinsx1 


Division of Mathematical Statistics, C.S.1.R.0., 
Canberra, A.C.T., Australia. 


SUMMARY 


The approximate method given by Bartlett [1] for estimating the 
slope of a regression line from the join of means of tail regions for 
evenly spaced values of the predicting variable is extended to some 
bivariate distributions with constant or changing array variances. The 
application of this procedure to the fitting of a regression line to com- 
mercial volume on sectional area at breast height in even-aged stands 
_ of timber is examined in detail. 


INTRODUCTION 


Interest in the use of an approximate method to determine the 
slope of a regression line by the joins of means of tail regions was 
aroused by a recurring problem of fitting regression lines to commercial 
volume in stands of even-aged trees. This linear relation between 
commercial volume and sectional area at breast height will hereafter 
be referred to as the Volume Line. 

It is recognised that in general this relation is not in the strictest 
sense linear in that the volumes of the very smallest trees can be greater 
than expected while a few very large trees may have volumes less 
than expected from a linear fit to the central body of trees. In the 
data given subsequently abnormal trees such as double leaders were 
omitted as well as the very few trees with diameters less than 7.5’. 
The data have been used to illustrate the merit of the estimation of 
slopes from tail regions in data in which there is an indisputable linear 
relation between the dependent and independent variates and array 
variances increasing with the dependent variable. 

This characteristic heteroscedasticity in Volume Line data has been 
consistently ignored in discussions in the literature on fitting the Volume 
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Line or more complex expressions involving height explicitly. For 
greater efficiency in estimation of the Volume Line, the change in 
array variance should be taken into account by the use of weights 
inversely proportional to the array variances. This implies the esti- 
mation of an appropriate system of weights through using the deviates 
from a first approximation to the Volume Line, and possibly adjusting 
these weights by a second stage approximation. The computational 
work involved in such an iterative process is laborious and may account 
for the neglect of the changing array variance. One consequence of 
this neglect is that the use of the least squares solution to the fitting 
of a line with equal weights throughont leads to an under-estimate 
of the variance of the computed slope and to a biased estimate of 
the variance of the estimated mean volume at different basal areas. ~ 

A short-cut procedure has been given by Bartlett [1] for estimating 
the regression slope for values of y equally replicated at uniformly 
spaced values of z and constant array variance. The data are divided 
into three groups and the means of the two tail regions #7, and Z393 


determined. The slope is estimated as (g, — 9:)/(; — %,). The - 


maximum efficiency of this estimate relative to the least squares fit 
is 88.9% when the two tails each contain one third of the observations. 
This investigation has been extended to the Normal, Triangular and 
certain #-distributions by H. Thiel and J. van Yzeren [2] and to a 
variety of other distributions by W. M. Gibson and G. H. Jowett [3], 
in all instances with constant array variances. 

In Table I are given optimum values for the size of the tail regions 
and corresponding efficiencies for two -distributions with constant and 
changing array variances. 


TABLE I 
Optimum Tait Reaion Sizes AND EFFICIENCIES 

Optimum size of tail 

Form of Array Maximum regions as proportion 
Distribution Variance Efficiency of population 

Lower Upper 
B(4, 4) Constant 83.8 .29 .29 
B(4, 4) kx} 80.4 .22 .35 
B(4, 4) kx 76.3 15 
B(3, 5) Constant 83.6 31 .26 
A(3, 5) kx 80.4 22 .34 
B(3, 5) kz 75.9 13 Ad 
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The functional forms for the variances, proportional to z'” and 
x, were chosen for convenience in illustration and more usually in 
practice a linear or quadratic expression would be appropriate. The 
effect of increasing array variances is to increase the contribution of 
the upper tail relative to the lower. The efficiency is still satisfactory 
for high rates of change and has encouraged the examination of data 
on fully measured stands of timber for a comparison of the precision 
of this method with a least squares fit of the regression line with changing 
and constant weights. 


DATA 


The data, kindly made available by the Director General, Forestry 
and Timber Bureau, Canberra, comprise measurements of individual 
trees on three plots of Pinus radiata, Mt. Stromlo plantation, Australian 
Capital Territory, as follows:— 


(i) 187 trees, Compartment 51, planted 1934 at 12’ x 12’, thinned 
several times and measured standing in 1954. Parameters of 
sectional area at breast height: Mean (<) = .6065, S.D (c) = .2073, 
skewness (71) = .9390, kurtosis (y2) = —.0695. 

(ii) 140 trees, Compartment 30, planted 1926 at 12’ X 12’, thinned 
several times, felled and measured in 1950: 


#= .7539, o = .1786, = .43382, Yo = .3038. 


(iii) 186 trees, Compartment 9, planted 1917 at 6’ X 6’, unthinned, 
felled and measured in 1941: 


= 5783, o = 1484, .7084, v2 = .3005. 
METHODS 


A regression line was fitted to each set of data by least squares 
ignoring variation in array variances. The squares of deviates of 
actual values from the corresponding estimates were then examined 
for a functional relationship to diameter. No generalisation of this 
relation seems to be possible, at least in the case of stands which may 
be subject to thinning. Tor two of the three sites it was a straight 
line. For the third it was approximately quadratic. The weights 
were estimated as the reciprocals of the array variances computed 
from these relationships. The estimation of weights was sharpened 
by recycling with the use of deviates in the second stage from a weighted 
regression line with weights from the first stage. Two criteria of good- 
ness of a weight function are that (a) the weighted mean residual 
variance about the weighted regression line should approximate closely 
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to unity and (b) the sum of the array variances estimated as the sum 
of reciprocals of weights should approximate closely to the sum of 
squares of deviates from the weighted line. These criteria were reason- 
ably satisfied for the two sites, Compartments 51, 30, with linear 
relations between array variance and diameter. For Compartment 9 
some slight graphical adjustment was necessary at the extremes to 
the quadratic relation to achieve this result. The relevant information 
is given in Table II. 


TABLE II 
Goopness or WEIGHT FUNCTION 
Estimated 
Weighted Array 
Stand Residual M.S. Variances | > (y — fu)? 
Cpt. 51 Mt. Stromlo 1.0109 553.080 552.812 
Cpt. 30 “ 1.0189 379.020 379.272 
Cpt. 9 “ id 1.0010 312.308 312.577 


A weighted regression line was fitted to the data using as weights 
the reciprocals of estimated array variances. For the weighted re- 
gression line, with sample size n: 


where 
or 


wiles — wiles — 
where 9; ig the population array mean at z; . For repeated sampling 
at theame z-values, the expectation of the second term on the right 
is zero since the sum over each array is zero and the value of the first 
term is 8, the slope for the population. The variance of b, is the 
expectations of the square of the second term or 
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For the unweighted regression line: 
| 


t=1 t=1 


The expectation of b is 6 and the variance of b is the expectation of 
the square of the second term or 


> (2, — — 9.) / a] 


If it is incorrectly assumed that E(y; — 9,)’ is constant (o”) over all 
arrays, then the expression becomes o°/)>."., (xz; — where o° is 
estimated as the residual mean square. For the slope estimated from 
the join of the means of the tail nee 
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£, 

where u and J are the number of items in the upper and lower tail 
regions. The expectation of b, is 8 and the variance of }, is the expecta- 
tion of the square of the second term or 
1 n 1 t 
Ey; — +z Ey; 9:)° 
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The sizes of the tail regions u and | were determined by iteration to 


minimize the variances of the slopes. 
The efficiencies of the estimate b, were calculated in terms of Var. (b,,) 


and unbiased Var. (b): 
E = Var (b,)/Var (b,.), = Var (b)/Var (b,). 


The under estimation of the error of the unweighted slope based on 
the standard formula for uniform array variances is noteworthy. It 
is 24, 4 and 37 percent for Compartments 51, 30 and 9 respectively. 
The average efficiency of the tail region estimation relative to the 


RESULTS 


TABLE III 
CALCULATIONS FOR THE THREE COMPARTMENTS 


Un- | Weighted | Join of 
Sample Re- Tail 
Size i gression | Regions 


137 .5258 - 6065 
12.6387 | 15.3518 
—5.0824 | —5.1085 
33.7351 | 33.7351 


13.8972 


of data | E BE} 
Cpt. 51 
85.8 | 102.7 
var (b) 
unbiased .9162 .7661 .8925 
7 var (b) 
: biased .6950 
Cpt. 30} 140 z .7539 .7210 .7539 

16.1434 | 15.3673 | 16.1434 
4 a —1.5396 | —1.5668 | —1.2023 
3 b 23.4554 | 23.4869 | 23.0079 | 78.8 | 83.1 
var (b) 
unbiased .6395 .6068 .7697 
= var (b) 
biased .6153 
Cpt.9 | 186 5210 .5783 
13.8972 | 12.4270| 
a — 1.0387 —.7694| —.5589 
b 25.8272 | 25.3309 | 24.9975 | 91.1 | 124.2 
var (b) 
unbiased . 6589 .4832 . 5306 
var (b) 
biased .4129 
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unweighted regression over the three sets of data is 103 so that there 
is virtually no difference in the effectiveness of those two procedures. 

The maximum efficiencies for b, were given when the percentage 
of observations falling in the lower and upper tail regions are as listed 
in Table IV. 

In the regions of optimum values for upper and lower tail sizes, 
the inclusion of an additional observation in the lower tail will tend 
to have a greater effect on the efficiency than an additional observation 
in the upper tail. The effectiveness for three selected sizes are given 
in Table IV to give some measure of sensitivity to departure from 
the optimum. j 

It is desirable to be able to make a recommendation for a specific 
size to the upper and lower tail sizes. For the three sets of data the 
average sizes for the lower and upper tail are 22.6% and 34.9% re- 
spectively. Convenient values to remember are 3 and 3 respectively. 


CONCLUSIONS 


The situation is by no means uncommon that in a regression re- 
lationship the array variances increase with the predicting variable and 
this aspect is frequently ignored as a matter of convenience. With 
a genuine linear trend this will lead to an unbiased but less efficient 
estimate of the slope than if computed with appropriate weights. 
Moreover the use of the standard formula for the error of the unweighted 
slope will be an underestimate of the true value, sometimes seriously. 
The construction of the weight function is itself laborious, especially 
when the form of the function cannot be generalised over apparently 
similar data. Collectively these factors favour the use of the tail 
region method of estimation of slope. 

For the data on the Volume Line which has been presented, the 
efficiency of the tail region estimate of slope at optimum tail size is 
of the order of 85% of the weighted regression slope, but it is as effi- 
cient as the estimate of the unweighted regression and involves decidedly 
less computation. 

The virtue of the method would be defeated if it was necessary 
to determine the optimum sizes for each individual case, but the data 
indicate that the efficiency is not very sensitive to moderate departures 
from the optimum. It is tentatively recommended for P-radiata in 
old stands that the sizes of the regions be } of the sample size for the 
lower tail and 4 for the upper tail. 
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EFFICIENCY OF ANIMAL TESTING SCHEMES 


CHARLES SMITH 


A. R. C. Animal Breeding Research Organisation, 
Edinburgh, 9, Scotland. 


I. INTRODUCTION 


The use of some method of testing is common in current schemes 
of livestock improvement. The problem arises of finding the testing 
procedure which is most efficient in terms of the improvement it provides. 
This efficiency is relevant at three levels: in determining the optimum 
design of a particular scheme, in comparing the improvement expected 
from different schemes, and in planning how a testing scheme can be 
best integrated with the population as a whole. Concerning the first 
level, Robertson [1957] and Finney [1957] derived expressions for finding 
the optimum test structure in animal and plant breeding respectively, 
so as to maximise the true superiority of chosen families or varieties 
tested with a given amount of testing facilities. An optimum point 
exists because of the antithesis under these conditions of the accuracy 
of selection, which increases as test group size increases, and the choice 
available, which decreases as test group size increases since fewer 
family units may be tested. 

In practice the requirement of breeding animals is likely to be in 
terms of individuals rather than of families, say as the number of 
males required in the breeding herds. This variation in requirement 
is incorporated in an extension to the previous work in which the 
optimum test design and maximum gains are studied when selection 
comprises (i) non-tested members of selected families (ii) all members 
of selected families and (iii) only tested animals. In the second part 
of the paper we consider how best to use the testing facilities and 
selected individuals so as to produce the maximum improvement in 
the breed or population as a whole. 

The expressions for genetic gain used by the above authors assume 
normality of distribution of breeding value. This assumption is 
likely to be a reasonable approximation, and moreover no more exact 
treatment is yet possible. Following the approach in Robertson’s 
[1957] paper it can be shown that the genetic superiority of chosen 
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ANIMAL TESTING SCHEMES 


groups is given by 


where p is the proportion of groups selected, 2 the corresponding ordinate 
of the normal curve, r the genetic relationship between members of 
the same group, / the observed intra-class correlation for the character 
concerned, h* the heritability, o¢ the genetic standard deviation, and K 
the ratio of the number of animals tested to the number of groups 
chosen. a is then (1 — ¢)/¢ and the group size 7 is equal to Kp. The 
optimum value of p is then given by 


K _ (2) 
— px) 


where x is the abscissa of the normal curve corresponding to p. The 
maximum value of AG can then be found by substituting the optimum 
p value in equation (1). Over a wide range of K/a > 5, AG,,.x. is a 
linear function of log (K/a). 


II. REQUIREMENTS ON NUMBERS OF INDIVIDUALS 


We now turn to consider the situation when the requirements are 
expressed as the number of individuals required for breeding. Though 
the requirement is stated as total individuals, the basis for selection 
is still the family average and the requirement may be made up of 
tested animals, or untested animals, or both, from the selected families. 
The three different. cases are discussed separately. 


(t) Only non-tested members of selecied avartabli 


This will arise when tested animals are not available to. breeding 
as when they are slaughtered for carcass measurement It the tots 
family size is m, we have 


aT 
(m + a)N 


where N is the total number of animals tested, and 7 the total required 
for breeding. This equation is of similar form to equation (1) except 
for the term in m and a and that K ‘a now becomes N(m + a) ‘aT 
(iz) All members of selected families availabl: 


Here 
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which is again of similar form with K/a becoming Nm/Ta. 
(tit) Only tested animals available. 


This is a special case since the proportion of animals selected is 
now independent of family size. We have here merely to maximise the 
accuracy of choice. Here 


[| n | 
= n Det hao (5) 
with p = T/N. The expression in square brackets takes the form of 
a U-shaped curve passing through a minimum when n = (1 — é)(1 — r)/ 
(r — 2¢ + rt). From the U-shaped nature of the curve it follows 
that the value of n which maximises improvement will be either 1 or 
m, since, in testing, n must be a positive integer between 1 and m. 
If n at the minimum is less than one, n at the maximum will be m. ~ 
If n at the minimum exceeds m, n at the maximum will be 1, and if 
n at the minimum lies between 1 and m, n at the maximum will be 
either 1 or m and which can be easily determined. 


III. COMPARISONS OF TESTING PROGRAMMES 


With the formulae (3), (4), and (5), we can now compare the prob- 
able gains from different testing programmes. The approach used 
was to specify a wide series of testing situations for pigs and to find 
the maximum gains attainable with optimum design for different 
testing procedures. These results may be seen in Smith [1958]. Here 
several graphs are presented to show the main features of the findings. 
Tn the figure the abscissa measures the number of animals tested N 
relative to the number required for breeding T on a logarithmic scale 
covering breeding requirements of 1 to 10,000 per 100 animals tested. 
The maximum genetic gains are in units of hog and so apply to all 
levels of h’ except where h’ affects the correlation ¢ between members 
of a test group. The graphs are these calculated for dam families 
of 4 and 10 (¢ = .2 and .3) and for sire families of 16 (¢ = .1) as are 
possible with pigs. 

The graphs in the figure bring out clearly the almost linear relation- 
ship between the genetic gain and the logarithm of the ratio N/T’, 
showing as would be expected, that this ratio and the heritability are 
the primary factors in determining the extent of the improvement. 
If we are considering a fixed value of h’, a decrease in ¢ implies a decrease 
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Tue GENETIC Garns, IN UNITS OF hog , FROM TESTING IN THE SPECIFIC CasES 
LisTED ABOVE AND IN THE TEXT 


in the environmental variance common to families and consequently 
an increase in information per family member tested and an increase 
in genetic gain as in Graphs III versus IV. As family size increases, 
fewer families need be selected to supply the breeding requirement. 
Thus either the choice of families or their accuracy in selection, or 
both, can be improved giving greater improvement, as in Graphs I 
versus III. A further relevant comparison concerns the type of family 
tested. Full-sib families are small but have a higher genetic relation- 
ship than half-sib families which in turn are larger. As would be 
expected, testing full-sib families is superior at high values of N/T 
where the breeding requirement is small and accuracy in selection is 
important, and ib family testing is preferable at low values of 
N/T as in Graphs III versus V and VI versus VII. The values ascribed 
to m, r and ¢ may alter the cross-over points of these graphs appreciably 
and will determine which testing procedure is desirable for a given 
ratio of N/T. 

We now look at how the improvement is affected by which animals 
in the selected families are used for breeding. We compare the cases (A) 
when only tested animals are used, (B) when all members of selected 
families are used, and (C) when only non-tested members of selected 
families are used. The advantage of (B) over (C) is to -be expected 
since in (C) we are throwing away the tested animals on which the 
information is direct besides reducing the numbers available for breed- 
ing per family and necessitating the selection of more families. As 
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tested animals constitute a decreasing part of the total number selected, 
the actual difference between the (B) and (C) graphs declines, though 
the relative advantage in terms of the total gains expected may either 
increase or decrease. Selection of tested animals alone (A) is slightly 
more efficient than selecting the whole family (B) at high levels of 
N/T, as in Graphs II versus III, but obviously becomes very inefficient 
as T approaches N. In general, we may conclude that, if the tested 
members of the selected families are not used for breeding, the genetic 
gain will be less than is otherwise possible. 

We have considered selection based solely on the family mean, 
whereas selection will be more effective if based on an index of indi- 
vidual merit and family average. However the discrepancy is not 
serious. When N/T’ is small, the outstanding individuals of poor 
families would form a low proportion of the total number selected 
and their omission will not affect the improvement by much. On the 
other hand when N/T is large, accuracy in selection is important and, 
to obtain this, large fractions of each family are tested. Consequently 
selecting on an index which includes individual merit will add little 
to the improvement from selecting on family means. 

Changes in the variables may be sought to improve the gains from 
testing according to the above findings. Unless the breeding require- 
ment is small, changes in the ratio N/7 to bring appreciably greater 
gains may be costly and it may be more expedient to reduce the size 
of T in some way (see below). The family size and the test group 
size can be varied at will within the reproductive limits of the species 
and should be chosen to make the designs efficient. 

We have considered selection for a single trait, but in farm animals 
improvement is generally sought in several traits concurrently. If the 
various traits can be combined in one statistic, for example the selection 
index, the above treatment will apply. 


IV. THE INTEGRATION OF A TESTING SYSTEM 
WITH THE POPULATION 


We have so far considered only the genetic merit of the selected 
animals. How should these be used so as the maximise the improvement 
of the breed or population as a whole from repeated testing? To 
answer this question we must study the accumulation of improvement 
from repeated testing and its dissemination throughout the breed. 

To study the accumulation of gains from testing we must make 
some assumptions about the size of the gains in successive generations. 
Our knowledge of the change in the genetic parameters and response 
on continued selection in farm animals is scanty. However, where 
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several traits must be considered concurrently, the selection pressure 
on any one cannot be very great. I‘urther, the intensity of seleetion 
possible is further limited by a slow reproductive rate in farm animal: 
On these two points it seems reasonable to expect the initial gains te 
be continued for several generations. With this assumption, it is easy 
to show that the improvement accumulated after n generations of 
selecting malés is (n/2)AG, where AG is the gain per generation. If 
only a fraction qg of all sires are selected by testing, the others being 
selected at random, the improvement will be (n/2)q AG. 

When we come to the dissemination of improvement throughou: 
the breed we are faced with two alternatives. Should the use of the 
testing facilities and selected animals be restricted to a small section 
of the breed which can act as a nucleus for the supply of breeding stock 
to the rest of the breeding herds? Or should the facilities and selected 
animals be unrestricted or open to all breeding herds? A nucleus plan 
is quite feasible in practice with farm animals. A stratified arrange- 
ment of breeding operations is clearly defined in many breeds with a 
flow of breeding stock from few elite herds to the many lower breeding 
herds. A practical example of a nucleus system is found in the Danish 
Landrace pig where official testing is restricted to the 250 State breeding 
centres. The open plan would ignore the presence of breed structure 
and assume equal use of selected males as sires for the next generation. 
In practice, this would rarely be the case. Instead, selection will be 
only partly on test results and disproportionate use is likely to be made 
of males, selected and unselected, from the elite herds. Unless the 
elite herds are genetically superior, this practice will reduce the gains 
possible by the open system proper. 

Suppose we employ a nucleus which supplies all the males needec: 
for the other breeding herds. Let us write H, and M,, for the genet: 
levels of the nucleus proper and the rest of the breeding herds . 
spectively in the nth generation. Then H, is equal to (n/2)AG. \ 
have M,,, = 3(H, + M,) and substituting for H, and simplifying, 
M,., = 3 AG[n — 1 + (3)"]. If n is large, we can say that the tw: 
sections are improving at the same rate and that the rest of the breedny: 
herds lag behind «ne nucleus by two generations 

When considering the open system, we are concerned with , 
whether it is better to supply all the breeding males requir 2h 
though their average merit will be low, or to supply only a prop: ior 
whose average merit will be higher. I{ we supply a propor! on gy 
the average merit of males used is g AG. Knowing that AG « cline: 
as q increases, we wish to know whether there is a maximu: q AG 
It turns out that q AG increases continually as y ineress: so that 
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the best value of g is one. We should thus provide all the males re- 
quired in the breeding herds. An exception to this rule is when, to 
make gq equal to one, we have to use animals which are poorer than 
average. In this case the maximum gains are when all animals which 
are above average are used though g will be less than one. 

Now let us compare the respective merits of the nucleus and open 
systems. Since the nucleus will require only a fraction of the breeding 
males which the open system needs, the rate of improvement will be 
greater in the nucleus. Let us take AG, as the rate of improvement 
in the open system and AG, that in the nucleus system, and AG, > AG, . 
We have shown that AG, will be proportional to log (N/T), where 
as before N is the total number of animals tested and 7’ the total 
requirement for breeding, and AG, will be proportional to log (Ns/T) 
where s is the number of breeding sons a sire in the nucleus group 
supplies to the rest of the breeding herds. That is, the use of a nucleus 
system will increase the rate of improvement to the same extent as 
providing s times the original testing facilities. The difference in the 
rates of gain under the two systems is proportional to log s, and roughly 
equal, per unit of log s, to the gradient of the plot of AG with log (N/T). 
Some examples of the increased gains possible by the nucleus system 
can be studied in the figure taking values of s of 10 and 100 say at 
different levels of N/T. Though the rate of improvement is greater 
in the nucleus system, there is a two-generation lag between the mean 
of the nucleus proper and the rest of the breeding herds. Because of 
this lag, which does not occur in the open system, it may be 4-5 gene- 
rations before the nucleus plan becomes superior but, because of its 
greater rate of improvement, it will increase its superiority with each 
successive generation. 

The smaller the nucleus the greater will be the rate of improvement 
because the smaller will be the requirement of tested animals. How- 
ever, if the nucleus becomes too small, we come up against the problem 
of inbreeding. Further, the nucleus should be large enough to supply 
all the remaining breeding herds with males, unless we have a nucleus 
within the original nucleus but here again inbreeding would be relevant. 
We reach the conclusion that the nucleus system of breeding is pre- 
ferable and the nucleus should be of a size specified by the above pro- 
visions and that all testing facilities be entirely devoted to testing 
animals from the nucleus for future use in the nucleus. 


V. SUMMARY 


In an extension of Robertson’s [1957] work the optimum test design 
and maximum improvement were studied when selection comprises (7) 
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non-tested members of selected families, (72) all members of selected 
families, and (zz) only tested animals. A variation in considering the 
breeding requirement as a total of individuals rather than of families 
was incorporated. Of the three cases the maximum gains are likely 
to be obtained when all members, tested and untested, of the selected 
families are used. The total improvement is largely determined by 
the ratio of the total number tested to the number required for breeding 
and by the heritability. The efficiency of the test is further affected 
by the size and nature of the family and by the test group size. These 
details can be varied to some extent to fit the requirements of efficiency. 

Consideration was also given of how best to use the testing facilities 
and selected individuals, so as to produce the maximum improvement 
in the breed or population as a whole. Maximum improvement will 
be obtained if testing facilities are restricted to a nucleus group of 
breeders, who in turn supply the rest of the breeders with breeding 
stock, so that full opportunity for testing and selection is confined to 
the nucleus group. 
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A NOTE ON THE ADDITIVE PARTITIONING OF 
CHI-SQUARE IN CONTINGENCY TABLES 


Marvin A. KasTENBAUM 


Mathematics Panel and Biology Division 
Oak Ridge National Laboratory,* Oak Ridge, Tennessee, U.S. A. 


1. INTRODUCTION 


The exact partition of the total chi-square statistic in ar X s 
contingen¢y table in such a way as to afford comparisons among fre- 
quencies achieved by pooling adjacent rows and columns has been 
treated by Irwin [1949], Lancaster [1949, 1950], and Kimball [1954]. 
The statistics obtained by such partitionings may be used to test 
orthogonal contrasts of a particular type, each with one degree of 
freedom. In general, however, an experimenter may be interested in 
testing other orthogonal contrasts, many of which may be handled 
by the approach suggested by Irwin and Lancaster, or more simply 
by short-cut formulas such as those given by Kimball. 

The purpose of this paper is to present short-cut formulas for 
handling a broader class of orthogonal contrasts which may be tested 
with one or more degrees of freedom. The component chi-square 
values computed by these formulas will add exactly to the total chi- 
square as computed in the usual way. The most conspicuous omis- 
sions will be those contrasts whose analogues in the analysis of variance 
involve the use of orthogonal polynomials for separating out linear, 
quadratic, cubic, etc., sums of squares (Yates [1948], Cochran [1954)]. 


2. THE SINGLE DEGREE OF FREEDOM CONTRAST 


Let n,; be the observed frequency in the ith row and jth column 
of anr X s contingency table, and designate 


C,= and N= DR, = 


i=l i=1 


Then, on the hypothesis of independence of rows and columns, 


*Operated by Union Carbide Corporation for the U. 8. Atomic Energy Commission. 
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= R.C)/N. (2.1) 
A test of this hypothesis is given by the statistic 


t=1 
which has a limiting chi-square distribution with (r — 1)(s — 1) degrees 
of freedom. 
For each orthogonal contrast (with the exception of those just 
specified), the data in an r X s table may be partitioned as follows 


Su Sia Si. 


Sa Soe So. 


(2.3) 
Seri Sss | Bes i Ss. 
N 
where the 2 X 2 subtable, 
Sut Sia (2.4) 
Sort Sos 


represents an array of data for a single contrast. In general, 
Sn (h, k = 1, 2) represents the sum of the n,; in the (hk)th quadrant 
of subtable (2.4). The ;;’s to be summed in any one of these quadrants 
are determined by the exact nature of the individual contrast. The 
single degree of freedom chi-square associated with all such contrasts is 


x? N{S.,(81. S22 S2.Si2) S.2(S,. S21 


+ + Se) 
or, equivalently, 
x? N S25 S.1S12)? 
+ 8.2) Si. 
(S.2Sa — 
+ Se, (2.5.1) 


_W. + Sai) S. i(Si2 + S22)] “| 
Si. + So. 


Thus, for example, in a contingency table with four rows and five 
columns, two sets of orthogonal contrasts, one for rows and another 
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for columns, may be tested by applying formulas (2.5) or (2.5.1) to 
each of the twelve subtables in the body of Table 1. 


3. CONTRASTS TESTED WITH MORE THAN ONE 
DEGREE OF FREEDOM 


In many experimental situations, the primary interest is in contrasts 
involving the rows or the columns of a contingency table rather than 
a combination of both rows and columns. Consider, for example, 
the case of a 4 X 5 contingency table which results from the selection 
of five 4-celled multinomial samples of sizes C, ,--- , Cs. The null 
hypothesis 


Ho: =p, (for += 1,2,3,4, and j=1,--- , 5) (3.1) 


is that the five samples have been drawn from the same multinomial 
population with parameters p; , (>_‘., p; = 1). A test of this hypoth- 
esis is provided by equation (2.2), with 12 degrees of freedom. If, 
however, there is reason to believe that the five samples were drawn 
from two different multinomial populations in such a way that the 
first two samples were drawn from one population and the remaining 
three samples from the other, then the following three separate ” 
potheses may be tested: 


Hoa Pa = Da = (3.2) 
Hoo : Dis = Dia = Dis = Diy (3.3) 
and 
Hos ps = pi (for all 12). (3.4) 
A test of Ho, , with three degrees of freedom is given by 


4 


Xs = + R, (Ca — Crna)’, (3.5) 


or equivalently 


=N 4 Mis (3.5.1) 


C; C; + 
A test of H,; , also with three degrees of freedom, is given by 
N 
(C, + + Co + Cs) 
“(Cs + Cy + + m2) — (Ci + C2)(Nis + Nis + 
or equivalently 


Xi = 
(8.6) 
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Finally, a test of Ho. , with six degrees of freedom is given by 


2 _ 4 E Nia nis (nis + net | 
| 


In general, in the r X s table the contribution made to the total 
chi-square statistic by the first m < s columns is 


2 
(r-—1)(m-1) 


The contributions of the remaining (s — m) columns is 


s 2] 
r 1 n?, ( ns) 


i=m+1 


(3.9) 


The contribution due to the contrast of the first m with the remaining 


(s — m) columns is 
m 2 s 2 
= ND i= + 810 


> ¢, 


i=l i=m+1 


It should be borne in mind that, throughout this treatment, rows and 
columns may be interchanged without affecting the results. 


4. ILLUSTRATION 


In a stock of Drosophila melanogaster in which sperm of four dif- 
ferent genotypes were produced, Dr. D. L. Lindsley of the Biology 
Division (Oak Ridge National Laboratory) wished to study the in- 
fluence of irradiation on the relative frequencies with which the sperm 
types were represented in the subsequent generation. The variability 
among pair matings, however, was so great as to completely obscure 
any influence of the irradiation on the composition of the sperm popula- 
tion. 

After a genetic procedure involving approximately ten generations 
and designed to make the stock genetically homogeneous, five of the 
seven lines were still heterogeneous with respect to sperm types. The 
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TABLE 2 
DISTRIBUTION OF GENOTYPES IN TWENTY Parr MatTINGS 


Genotypes 


II + IV 


11 
22 38 6 66 
25 44 11 80 


CON 


over-all heterogeneity at this point seemed to be attributable to the 
combination of data on males from two separate populations. That 
is to say, of the 20 samples of observed sperm, the distributions of 
the sperm in 15 samples appeared to be homogeneous but different 
from the remaining 5 samples. To test the hypotheses related to 
these observations, the data were partitioned in such a way as to carry 
out the tests outlined in Section 3. 

To test Ho: = ps (fori = 1, 2,3; 7 = 1, --- , 20) use equation 
(2.2) and compute Xj, = 66.21. (P < .005). 

To test Ho, : pi; = p; (fort = 1, 2, 3;7 = 1, --- , 15) use equation 
(3.8) with m = 15, and compute 


1.9534 , 2.8559 3.0913 
431 671 195 


To test Ho. : p,; = pi (fori = 1, 2,3; 7 = 16, --- , 20) use equation 


X3. = 1297( ) = 31.96. (P = .3). 


421 
Pair Total 
Matings I II 
13 22 6. 41 
16 36 12 64 4 
27 37 23 87 a 
23 44 7 74 —_— 
4 14 1 19 ee 
10 20 50 16 86 pect oe 
11 32 48 10 90 Loe. 
12 19 37 il 67 
14 27 56 18 101 
15 12 18 37 
16 25 26 7 58 hae 
17 49 44 11 104 fs 
18 38 45 6 89 aint. 
19 18 21 9 48 ah 
20 39 38 9 86 
Total 431 671 195 1297 ae 
4 
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(3.9) with m = 15 and s = 20 and compute 


x? = 1297(9:3388 0.3626 0.4603 


) = 4.78. (7<P< 8). 


195 


To test Hy; : p; = pi (for i = 1, 2, 3) use equation (3.10) and 
compute 


431 


2284 . 2.3418 . 0.9319 
431 671 195 


These tests permit us to conclude that the data are consistent with 
the following assertions: (1) the 20 samples are heterogeneous (Test 1); 
(2) the first 15 samples are homogeneous (Test 2); (3) the last 5 samples 
are homogeneous (Test 3); and (4) the first 15 samples and the last 
5 samples are drawn from two different populations. When the total 
chi-square statistic is significant, there may be some question as to 
whether the additive subdivision is appropriate. For instance, if Ho; 
is false, X3, as computed above will not follow the standard chi-square 
distribution even if H, holds, although the disturbance to X3, seems 
unlikely to be serious unless H,; is violently wrong. Before drawing 
any definite conclusions, tests 2 and 3 were made by separating the 
data into two distinct contingency tables. Now X3, = 30.47 with 
probability (.8 << P < .4), and X% = 5.79 with probability (.6 < P < .7). 
The results of these tests indicated to Dr. Lindsley that his genetic 
procedure had been partially successful and that possibly only a single 
genetic factor remained to be removed. 
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X? = 1297(2 ) = 29.47. (P< .001). 
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THE TREATMENT OF RECIPROCAL INTERACTION, WITH 
OR WITHOUT LAG, IN PATH ANALYSIS* 
WriGcHtT 
Department of Genetics, University of Wisconsin 
Madison, Wisconsin, U.S.A. 


Turner and Stevens in a recent paper [1959] and Tukey a few years 
ago [1954] have proposed a modification of the method of path analysis, 
introduced by the present author [1921]. Their central thesis, the 
replacement of the standardized path coefficients by the concrete path 
regressions, has been discussed recently (Wright [1960]). In addition, 
both of the above papers discussed the application of the method to 
reciprocal interaction between variables (feedback). A somewhat 
extended discussion of this important topic seems desirable. The above 
paper (Wright [1960]) or earlier ones (Wright [1934, 1954]) should be 
consulted for reviews of the general method. 

The direct application to instantaneous reciprocal was 
avoided in these earlier papers because of a complication in method 
that seemed likely to create confusion. It seemed preferable to deal 
with two such interacting variables as both completely determined 
by the same factors rather than as acting directly on each other. On 
consideration of the papers by Tukey and by Turner and Stevens, I 
agree that there is more merit in the direct treatment of circular paths 
by path analysis than I had vreviously concluded. 


RECIPROCAL INTERACTION WITH LAG 


Reciprocal interaction among variables presents difficulties whether 
there is an appreciable lag between action and reaction or there is 
virtually instantaneous mutual adjustment to any changes induced 
in either. In the former case, the difficulty is that action of either 
variable on the other in most cases rises gradually to a peak and then 
gradually falls off, while the method of path coefficients applies strictly 
only to point variables. If one correlates measurements taken at 
fixed intervals, the correlations can hardly describe fully the course 


1Paper No. 765 from the Department of Genetics, University of Wisconsin. 
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FIGURE 1. 
Patu DiaGRAM REPRESENTING RECIPROCAL INTERAUTION 
Path diagram representing reciprocal interaction between two variables A, B with different lags between 


action and reaction. C and D represent known external factors that affect A and B respectively while 
U and V represent the unknown residual factors. 


of the interaction. If one uses totals or averages for successive periods 
of time, there is almost certain to be some inaccuracy of the sort ex- 
pected from composite variables. For the most accurate results, the 
interval between observations or the lengths of the averaged periods 
should equal or be an aliquot part of the lag. As there is not likely to 
be exact correspondence, representation is in general most adequate if 
the observation intervals are much shorter than the lags, but this gives 
a system with a very large number of paths and hence a complicated 
one to solve. 

Figure 1 illustrates a simple pattern of reciprocal interaction between 
two variables A and B that are also affected by external known variables 
C and D and unknown residuals U and V. It should be understood 
that the analysis of any actual case usually presents complications 
that must be carefully considered and dealt with by elaboration of the 
diagram as fully as is practicable. Primes are used to indicate pre- 
ceding values of each variable. The time intervals for action of A’ on 
B and of B’ on A or of C on A and D on B may all differ. It is assumed 
that the observations of each variable are made in accordance with the 
intervals of maximum effect. Thus A should be at the interval from 
B’, of peak effect of deviations of B’, and B’ should follow A” at the 
interval of peak effect of the latter on it. A’ must be taken half way 
between A” and A in order that a, and b, may each be uniform. It is 
convenient to use simplified symbols for the path coefficients as indicated 
in Figure 1. Following are some of the equations of the general types 
rhe = dos Putas and = piri, = | (in which the p,,’s are the 
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standardized path coefficients relating variable V, to the variables 
that are indicated as determining it directly): 


Tap = + , Tap = + , 
rear = + betas, Tap) = + 


Thus 
Tap = + + + 
(a,b. + azb,)/(1 = a,b, — Azb2), 


II 


Tape) = + , = tes 

Taare = + Tac’ = 

= + brea , Taco: = a,b,a5 + 
= + bores: , Tac =0, 

Tan = 1 = + Oran +03 + » Tac: = ba, 


1 = + + bs + bi » Tee 


+ , 
etc. 


These equations can be extended through as many cycles as desired. 
Correlations involving D are analogous to those involving C. The 
inverse problem, estimation of path coefficients from knowledge of the 
correlations at a succession of intervals, involves much overdetermi- 
nation. This is especially the case if one or more outside variables 
such as C and D are measured and correlated with subsequent values 
of A and B. 

These formulae can be applied to censuses of predator (B) and 
prey (A) if a, , 6, and b, are positive and a, is negative. In cases of 
competition, a, and b, are positive and a, and b, negative. In cases 
of mutually beneficial interaction all four of these path coefficients 
are positive, Figures 2, 3 and 4 give the succession of correlations with 
A or B at a particular time in examples of these three situations, in- 
dicated by the diagrams at the left. 

There is no difficulty in dealing with more complicated situations. 
Figures 5, 6 and 7 show the waves of correlations in time in a food chain 
of three species, under the diagram of relations at the upper left. 

The lags are taken as the same in these examples. If assumed to be 
different, the curves merely need to be displaced relatively by the 
appropriate amounts. 
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FIGRUES 2, 3, 4. 
CoRRELATIONS BETWEEN NUMBERS OF INDIVIDUALS AT SUCCESSIVE TIMES 


Correlations between the numbers of individuals of one species, (A or B) with that of the same or 
the interacting species at successive times under the systems indicated by the diagrams at the left. 
Figure 2 (top) deals with relations between predator B and prey A, Figure 3 with that between two 
competitors and Figure 4 with that between commensals. Correlation of A at time zero (Ao) with A at 
preceding, A’, A’ etc., or subsequent, A: , A: etc., times are indicated by solid lines, those of Ao with 
B by long dash and dot, those of Bo with A (mirror image of preceding) by short dashes and those 
of Be with B at other times by dashes except that in Figure 2 the correlations are the same as those 
between A’s (solid line). 


This mode of treatment differs from those introduced by Lotka, 
Volterra, and Nicholson and Bailey in being correlational rather than 
deterministic. This has an advantage where the causes of variability 


are too complex to be fully known. It resembles the last in the ease of 


dealing with lags, which require rather complicated treatment by 
integro-differential equations, by Volterra’s method. It has the dis- 
advantage that it is restricted to the treatment of fluctuations so small 
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8c, 


FIGURES 5, 6, 7. 

CoRRELATIONS BETWEEN THE NUMBERS OF INDIVIDUALS IN THREE SPECIES 
Correlations between the number of individuals in three species that form a food chain, at successive 
periods of time under the system at the left of Figure 5, Correlations of Co (top predator) are given in 
Figure 5: Co with A by long dashes and dots, Co with B by short dashes and Co with C by dots. Cor- 
relations of Bo (intermediate) are given in Figure 6: Bo with A by long and short dashes, Bo with B 
by short dashes and Bo with C by short dashes. Correlations of A (base of food chain) are given in 

Figure 7: Ao with A solid, Ao with B long and short dashes, and Ao with C long dashes and dots. 


that the nonlinear terms that enter into the deterministic expressions 
may be considered negligible. 

An inverse study along these lines was made some years ago of the 
relations among deviations from trend of a number of hog variables 
(price, pack, live weight and the product of the last two, each for summer 
and winter) and corn variables (price, crop, acreage, yield) involving 
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deduction of 13 path coefficients in a central system and 23 peripheral 
ones from more than 500 correlations (Wright [1925]). The inaccuracy 
from use of composite variables could not have been great since action 
of one variable on another was represented as extending over more than 
one season in most cases. There was an enormous amount of over- 
determination of the path coefficients. The analysis yielded a con- 
sistent account of the observed waves of correlation, corresponding to 
rapidly damped oscillations, two years from crest to trough, in the 
values of the hog variables, induced, with diverse lags, by deviations 
in the corn crop which acted largely as an outside factor (determined 
80% by deviations in yield and only 20% by ones in acreage). 


INSTANTANEOUS RECIPROCAL INTERACTION 


This study involved some cases of virtually instantaneous inter- 
action e.g. between hog pack and hog price. The most obvious mode of 
representation in a path diagram is by use of separate arrows, directed 
from each variable to the other. The rule 71. = >.; pia; still holds 
throughout a system with circular paths. If V, and V, are the re- 
ciprocally interacting variables, we also have ri. = >>; Poti; . These 
yield two independent equations from each known correlation in systems 
in which two or more interactions are linked. 

The usual analysis of correlations into compound path coefficients 
breaks down under the proviso that there shall be no passage through 
the same variable twice in the same path. Valid equations can, never- 
theless, be written by analyzing step by step the correlation terms 
T2; OF 7; in the above expressions. 

The difficulties encountered in trying to deal with reciprocal inter- 
actions embedded in a complex situation led the author at the time 
to reject this mode of treatment as impracticable. In the above study, 
pack and price were merely traced back separately to the same pre- 
ceding key variables. 

A method with considerable theoretical advantage was arrived at 
later (Wright [1934]). This was to treat quantity marketed Q and price 
P as both completely determined by two auxiliary hypothetical vari- 
ables, the supply situation S (quantities that would be marketed at 
given prices) and the demand situation D (prices that would be offered 
for given quantities marketed) in accordance with classical economic 
theory. It was assumed that the demand and supply variables, as 
percentage deviations from their norms, can be represented to an 
adequate approximation by a line with constant slope but intercepts 
on the axes of means that varied from year to year (Figure 8). 

The path diagram, including a variable A that is known to be cor- 
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Supply Curve 
eXs 


Ys eXs 


Demand Curve 
Yo =(D-G)+ 


FIGURE 8. 


GEOMETRIC RELATIONS OF PricE P AND QuaLiry MARKETED Q To SUPPLY AND 
DEMAND CURVES UNDER CLASSICAL Economic THEORY. 


related with demand, but not supply, and a variable B that is known to 
be correlated with supply, but not demand, is given in Figure 9. It is 
assumed that the supply and demand situations vary independently. 
Single letters with subscripts are again used for the path coefficients for 
convenience. Considering A but not B we may write the following 
equations: 


Tea = 


Toa = QiTda 


% = Didi + » 
roo 


FIGURE 9. 


Pata D1acramM oF RELATION oF PricE P AND Quantity MARKETED Q FROM THE 
View Pornt or ComMPLeTE DETERMINATION OF EACH By SUPPLY AND 
DEMAND AS IN Ficure 8. 


A and B are external factors correlated respectively with demand and supply deviations. 


These are easily solved. 


pa gi) 
from the fourth and fifth equations. 
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PG = (Teg — Pits) = Tro — + Pig 
from the third equation. 


Tra 
Toa 


from the first and second equations. 
Thus 


2 
n= rhe) /(1 22 + 4), 
Toa Tea 
Toa = = Tra/Pr Po = 


The signs must be such as to satisfy the first three equations to be 
solved. 

If a variable of type B is known, analogous equations can be written 
and solved. If there is reason to believe that there is a correlation 
between the supply and demand situations, it requires knowledge of 
variables of both types A and B to solve for the seven unknown paths. | 

This is a case in which the ratios of certain of the concrete path 
regressions that give the slopes of the supply and demand lines, (the 
elasticities of supply e and of demand 7), are ordinarily of most interest. 
The equations could be expressed directly in concrete terms but the 
third would invole the hypothetical variances of D and S as well as the 
known ones of P and Q. The simplest procedure is, as in most cases, 
to write the equations in terms of the standardized coefficients, .un- 
encumbered with useless variances, solve and only as a final step deduce 
the concrete expressions that are desired. 

The regression equations for Q and P are as follows: 


Q- Q = Cen(D — Q) + — Q), 
P-—P= Cpp(D — Q) + cps(S — Q). 


While these coefficients can only be calculated symbolically, 
(Con = 9:09/cp etc.), the desired concrete elasticities (slopes) can be 
calculated as ratios in which the unknown standard deviations cancel. 


Xs —P. Cpp 


(g=$) _ Cos _ $200, 
D=3 Cps 


Following are some illustrations. The first two trace to the corn 
and hog study referred to above. The third is from data provided by 
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Professor Henry Schultz (Wright [1934]). The last two were reported 
by P. G. Wright [1928] who, in a study of the effects of the tariff on 
animal and vegetable oils, made, at my suggestion, a comparison of the 
results of this mode of approach with results (given in parentheses 
below) that he had arrived at by another method (segregating successive 
observations in which price and output changed in the same direction).’ 


Commodity Elasticity 
e (supply) n (demand) 


Summer hog pack (1889-1914) | +.13 — .94 
Winter hog pack’ (1870-1914) | +.11 — .88 
Potatoes (1896-1914) +.03 — .82 
Butter (1914-1926) +1.43 (+1.65) —.62 (—.53) 
Flax (1914-1926) +2.39 (+1.88) —.80(—.81) 


The results from the two methods in the last two cases probably 
agree as well as can be expected in view of the paucity of data. The 
radical difference between the negligible elasticities of supply in the 
first three and the high values in the last two is clearly brought out. 

Consider next the same situation from the viewpoint of a diagram 
in which direct interaction of P and Q is indicated. The causal relations 
seem to be represented best by treating D as a direct factor only of P, 
and S as a direct factor only of Q. 

We will deal first only with P, Q and the two hypothetical variables 
D and S. Different symbols are used for the path coefficients since 
even those that connect the same variables (ppp , pgs) in Figures 9 
and 10 cannot be expected to have the same values. 


Too = = yw/(1 — zy), Teg = 1 = yreg + 2% Qs , 
Tep = W+ = wW/(1 — zy), = = T+ 


Tes = Iqs = rz/(1 — zy), tarps = + , 


Tas = 2 + = 2/(1 — zy), «+ y— 
= 1= + , (x + y)/(1 + zy). 


As only three known correlations (rpp , fgg , Trg) are analyzed, 


1] wish to correct an unfortunate reversal of ep and og in the data at the bottom of p. 200 in the 
1934 reference. 
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solution requires use of a known variable of type A or B. Either of 
these adds two usable equations (e.g. = YTra Trea = + Qa) 
at the expense of only one additional unknown path coefficient and 
thus permits solution: 


Y = Toa/Tpa ; 

(from the last equation) 

w= V(1 — zy)(1 — 
(from the fifth equation) 

2= V(1 — zy)(1 — 
(from the sixth equation) | 


Toa = Tpa(l — xy)/w. 


The regression equations for Q and P are as follows: 


Q- 


By substituting each in the other, these can be me in terms 
of S and D for comparison with the preceding analysis: 


—Pp p-3 top Cpa 
Thus from the point of view expressed in Figure 10, the elasticity 
of supply is given directly by the path regression of quantity on price 
and the elasticity of demand is given by the reciprocal of the path 
regression of price on quantity. It should be noted that path re- 


gressions as well as path coefficients are. not transferable between 
diagrams 9 and 10 which present different points of view with respect 
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A 


FIGURE 10. 
Pato DIAGRAM OF RELATIONS OF THE VARIABLES IN FIGuRE 9 
Path diagram of relations of the same variables as in Figure 9 from the viewpoint of reciprocal inter- 


action of price (affected directly by demand) and quantity marketed (affected directly by supply). 
All variables are the same as in Figure-9. 


to causal relations. The elasticities come out the same by both methods 
of calculation. 

From comparison of the regression equations it may be seen that 
the relations of the concrete coefficients with identical subscripts from 
Figures 9 and 10 are as follows: 7% 


e 
Cosa) = — —) , 

n 
e 
Cpoao) = 1 — 


THE CASE OF RESPIRATORY HOMEOSTASIS 


Turner and Stevens take as an example of feedback the relations 
between CO, concentration in respired air A, that in the alveolar air 
of the lungs C and depth of respiration D in the classical data obtained 
by J. S. Haldane and J. G. Priestley [1905]. The above authors use a 
diagram of the type of Figure 11 except that they do not include residual 
factors V and W. 


FIGURE 11. 
PatH DIAGRAM OF RELATIONS IN RESPIRATION 


Path diagram of relations of COs concentration of respired air A, COz concentration of alveolar air 
as estimated C and depth of respiration D in subject J.S.H. V and W represent residual factors. 


The essential data used by Turner and Stevens are as follows on 
expressing the relations among the measured variables in terms of 
correlatjon coefficients instead of regressions. 


% CO, in respired air: A = 3.106, o4 = 1.6052, rp, = .9814 
% CO, in alveolar air: C = 5.810, o¢ = .5058, rnc = .8132. 


Depth of respiration: D = 1224.5, op = 431.7, rc, = .8577. 
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Turner and Stevens use the observed regressions bp, and b;-, but 
make no use of the observed relation between C and D. As a third 
datum, they assume that the path regression of C on A is unity, as 
expected under a steady state with depth of respiration constant. 

Thus they used equivalents of the following three equations and 
obtained the indicated path regressions. The corresponding path 
coefficients are also given here: 


= .9814 = Poctca Poc = 1.1442, = 977, 
8577 = + Pca Peo = — 2.3597, Cop = — .00277. 


This solution implies that rep = (pep + Poc)/(l + PcpPpoc) = 
.7150. This is considerably smaller than the actual value in the data 
(.8132) though not significantly since there were only 15 sets of obser- 
vations, here given equal weight. The path coefficients, ppw and 
Pcv , that indicate the importance of residual factors can readily be 
calculated, but while the former comes out unduly large the latter © 
comes out imaginary (on making no use of observed rcp). 


Coa 


Toa 


Pow = V(i = PcvoPoe)(1 vc) ‘820, 
= V(1 — pealca — — PeoPoc) = V —.1284. 


There is strong a priori reason for treating C as only a somewhat 
imperfect measure of CO, content in the alveolar air. The estimates 
were based on the CO, content in samples taken after expiration to 
nearly the greatest possible expect, corrected for residual non-alveolar 
air which in turn was based on estimation of the “dead space” in the 
lungs of the subject. This suggests the introduction of a variable X 
for the actual CO, concentration of pulmonary air that has the average 
relation to A and D and treatment of C as determined by this and 
U, random errors of technique (Figure 12). 
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FIGURE 12. 
MopiFiep Pato D1aGRram oF VARIABLES IN FIGURE 11 


Path diagram of same variables as in Figure 11 except that C is represented as an imperfect meagure 
ef alveolar CO: content, deviating at random U from a quantity X which stands in the average relation 
to A and D in these experiments. 
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There are now seven paths, but the three known correlations, the 
three cases of formally complete determination and the path regression 
Cra = 1 permit solution with results for the path coefficients indicated 
in Figure 12. The following path regressions can be calculated, taking 
Ox = = 4675, 


= i. = Cxp = — .00277, 
Cox = 977, Coxa = Cyn = 1. 


The path regressions are the same as before but by interpreting 
C as a somewhat imperfect measure of the CO, concentration of the 
alveolar air, a system is obtained that is consistent with all of the data 
and which indicates reasonable degrees of importance of the residual 
factors. 

The above treatment does not use the data on one of the variables 
measured by Haldane and Priestley, the frequency of respiration F. 
They made 39 experiments on one subject (J.S.H.) and 67 on another 
(J.G.P.). Unfortunately for the present purpose only the averages for 
groupings of similar experiments are recorded (15 and 21 groups re- 
spectively). The number of experiments in a group varied from 1 to 
4 in the former case and | to 6 in the latter. The best available values 
to use for the correlations and regressions are those derived by weighting 
the reported group averages. 

In the case of J.G.P. the virtual absence of correlation between 
the estimates of CO, in the alveolar air C and the other variables, the 


% CO; in respired air 
% COs in alveolar air 
Depth of respiration (cc) 
Frequency of respiration 


Cérrelations 


7 

Variables Subjects a 

J.S.H. J.G.P. 

(A) 3.052 1.350 | 3.507 1.348 race, 
(C) 5.767 478 6.568 241 ike 

(D) 1200 358 794 214 ee 

(F) 15.21 2.203 | 17.60 2.161 ' a 

823 093 

TDA .979 710 
Toe .772 O15 be 

TRA .501 .779 a 

ror .530 | 416 
Tor .354 | — .067 
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high mean and low standard deviation of C suggest an overcorrection 
for nonalveolar air. The results brought out sufficiently well the close 
control of alveolar CO, , in spite of enormous variability in that in 
the respired air, but do not permit satisfactory path analysis. The 
data from J.S.H. appear to be more satisfactory for this purpose 
but the probability that the correction was not exact must be borne 
in mind. We will use the data to illustrate the possibilities of path 
analysis in more complicated systems than a single reciprocal inter- 
action, recognizing that they are not as suitable as if recorded with 
this in mind. 

The following pattern, Figure 13, was used after it appeared on 
trial that it was necessary to assume that a common factor or factors 
contributed positively to correlations between F, X and D. The 
residual factor Z, back of X, is represented as acting on all three of 
these. This residual factor presumably consists primarily in variability 
in the amount of CO, released into the blood by metabolism throughout 
the body. If X were a measure of CO, in equilibrium with the arterial 
blood, it might be supposed that the effects of Z on depth and frequency © 
of respiration would be fully taken care of in the paths of interaction 
of X with Dand F. The fact that this is not the case indicates that X 
does not fully represent CO, content of the blood. The increased com- 
plexity of the diagram makes the use of simplified symbols for the path 
coefficients desirable, earlier letters of the alphabet for the measured 
variables and later letters for hypothetical ones. 
In solving, we begin with the assumption that 


ox oc 
= — = — = 1. 
Oa 


From the observed values of o¢ and co, , 22¢,; = 2.8249. 


FIGURE 13. 
Paty DiaGraM, FREQUENCY OF RESPIRATION AND VARIABLES OF FiGuRE 12 


Path diagram in which frequency of respiration (F) has been added to the variables of Figure 12. 
V is here the residual factor for F and Z is added as the residual for X and as also affecting F and D 
enough to give a unique soluti The correlati are here weighted by the ber of experiments 
instead of relating to unequal groups of experiments, treated as of equal weight as in Figures 11 and 12. 
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Tea = 8229 = ery, , Toa = .9793 = dirxa , 
Tra = .0008 = firxa 
Txa = + Lo + = L2/(1 — x, d, — = 22/K. 
Thus rea = 29¢,/K = .8229, K = 3.4329, 
d, = 1.190lc, , f, = .6086c, , 
2.8249/c,, = —(2.4329 + zsf,)/d, . 


The equations (excluding those involving C) are given in Table 1. 
Solution requires that path coefficients be found that satisfy the relations 
arrived at above, that yield the same result from the three equations 
for rxp , the three for rxr , the three for rpr and that satisfy the 3 equa- 
tions of complete determination. The following approximate values 
were obtained by trial and error. They yield the correlations shown in 
the last column of Table 1 which agree to two decimal places where 
there is a test. 


C, = Pex = .85, d, = pox = 1.012, exp = —.00263, 

Co = Dev = .927, d, = Pow = .28, Cx, = 1, 

Ly = Pxp = —2.328, ds = Poz = .30, Cxr = — .0280, 

Ze = Pxa = 3.328, fi = pex = .517, Cox = 893, 

Y3 = Pxr = —.15, fo = Prx = .78, Cry = 2.81. 

= Pxz = fs = Prz = .48, 

It may be seen that the relative effect (x, = —.15) of rate of respi- 
ration on alveolar CO, is almost negligible in comparison with that 
(x, = —2.328) of depth of respiration. 


It is apparent, however, from the values of the coefficients measuring 
the direct influence of metabolism Z or depth and frequency of respira- 
tion that X cannot be identified fully with the CO, content of alveolar 
air in immediate equilibrium with the arterial blood (which it will be 
convenient to designate Y). It must be considered to be the function 
of variable Z, D, F and A from which the estimate C may be considered 
a random deviate. It is instructive to attempt to add Y to the diagram. 

With Y present, one might suppose that Z would affect D and F 
only through its effect on Y, thus eliminating the direct paths Z — D 
and Z— F. In the solution arrived at above, however, prz is too large 
and ppz too small. In defining Y, the arbitrary assumption will be 
made that the values cf ppz and prz in the new diagram are to be 
numerically equal but the former negative and the latter positive. 
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TABLE 1 


ForRMULAE FOR CoRRELATIONS BETWEEN VARIABLES OF FIGURE 13 AND 
CALCULATED FROM THE DEDUCED COEFFICIENTS 


Formula 


1- xd, 


.8229/c, 

Dil pa + 22+ = — — tafi) = 
.9793 = dirxa 

.5008 = Sitxa 


= 2Tpw + = 
=drxw +d 


= firxw 


= + = 
= dirry 
=firxy the 


= pz + + = + + 
= + 
=firxz ths 


= -7718/c1 
= + pa + pr + dz 
=d, + + dirxz 


= .3541/c, 
= pr + + + 
=fitfrxy 


= .5295 
= dryer + dtrw + dsrrz 
=firxp 


peed 


BES 


= 1 = 2rxp + + xr + xz 
Top = 1 =arxp + drpw + 
ter =1=firxe + ferry + 


288 


With this assumption a solution is readily arrived at. Some of 
the path coefficients of the new diagram (Figure 14) (which will be 
distinguished by primes) are unchanged. Others can be deduced. 
The rest can all be expressed in terms of y{ (= p}x) and thus can be 
obtained by iteration for only this one parameter. 
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Value 
No. 
3.433 
3 .968 
.979 
| 5 Tra = . 
—.190 
8 
9 Txv 
1l Try 
12 Txz 
13 Tpz 
14 Trz 
15 TxD ~ 
| 16 
| 17 
18 Tyr 
19 
20 
23 
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25 
26 
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FIGURE 14. 
System or FicureE 13, VARIABLE Y ADDED 


Same system as in Figure 13 except for addition of variable Y, intended to give a better representation 
than X of the COs concentration of the alveolar air in direct equilibrium with that of the arterial 
blood by making ppz and pprz equal in absolute value but opposite in sign. 


By comparison of equations 1-7 and 2-8 for rpw in the two systems 
(Tables 1 and 2) it appears that d{ = d,/y{ and that d} = d,. Similarly 
from the two equations for rey , 1-11 and 2-13, it appears that f/ = 
fi/yi and = fe - 

From 1-24 and 2-33 supplemented by 2-23 we obtain: 


2,(1 — = 2.(1 — 


From 2-31 and 2-36 and the substitutes dj = d,/y, , two expressions 
can be obtained for ryr/y{ which lead to a solution for fj on the assump- 
tion that d; = —f;. Another solution for dj can be obtained similarly 
from 2-32 and 2-35. These differ slightly (.196, 206) because the 
solution of the system was carried only to approximate agreement for 
two decimal places. The average f; = .201, d; = —.201 is adopted. 

From 1-25 and 2-35, diy; = d; — dj , and from 1-26 and 2-36 
fiy; = fs — {3 . From the ratio of zjy} above to either of these we 
obtain slightly discrepant values of xjy{ , (.400, 394). The values of 
xi , 2, can now be calculated, taking xj as .397/y{ and yj can be ex- 
pressed as a multiple of yf. It was found by trial and error that yf = 
.944 to a good approximation. The path coefficients for this second 
diagram. (Figure 14) came out as follows: 


cy = .85, dj = 1.072, yi = .944, 
c; = .527, d; = .28, y2 = .502. 
= —1.404, d; = —.201, 
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TABLE 2 


FORMULAE FOR CORRELATIONS BETWEEN VARIABLES OF FIGURE 14 AND 
CALCULATED FROM THE DEpucED Pats CoEFFICIENTS 


Formula 
K’ =1 — yi(dizi + + 24) 
Txa = .8229/c; = tirp, + 22 + + = 24/K' 
Tya yirxa 
TDA .9793 = diry, = diyirxa 
| Tra .5008 = firy, = fiyirxs 
+ 


Txw = + + = ds/K’ 
= Yilxw 

Tpw = diryw + = diyirxw + 

rew =Siryw = Sivirxw 


, , 
= + + = raf2/K’ 
= 

, 

= diryy = diyirxy 

/ 
= firyy +f2 =fiyirxy +f? 


= zitpz + + ziryz 
= yirxz + 9: 
= diryz + di 
=firyz +f3 


= yirxp + pz 
= di + + diryz 


=fitsyyy 
= yirxr + yirrz 


= zirpy + + + 24 
= yi + yirez 


= 
= + tirpr + 
= dirxgy + dirxw + dirxz 


= .3541/ca 
= + rr, + 23 + 
=firxsy + 


= 
= diryr + dirpw + direz 
=firyo + 


SES) SS 
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“4 
Value 
2.0707 
. 968 
.914 
.979 
.501 
—.190 
—.179 
+ .088 
— .098 
— .034 
— .032 
— .035 
. 762 
— .143 
. 367 
.192 
| .402 
48 
19 | .948 
.597 
.596 
TxD .908 
.910 
.907 
TXF 
.421 
4 
.531 
.532 
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TABLE 2—(Continued) 


No. Formula | Value 
33 = 1=2irxyp + + + | 
34 tyy = 1 = yirxy + yiryz 1.005 
35 Tpp = l= diryp + dir pw dir p, 1.004 
36 tre =1= firye + +S 1 003 

xi = 2.005, fi = 548 

xi = .420, fs = .201, 


The rate of increase of CO, in Y should be proportional to Z while 
its rate of decrease in the lungs should be proportional to the difference 
between the concentrations in Y and X. At flux equilibrium 


dY 
an — — X) = 0, 


k(Z + 6Z) = + — X — 6X), 
k, = kp bY — kz 8X, 


Thus cyy = 1, cyz = k,/kz. 

In the case of X, CO, tends to increase at the rate k.(¥Y — A) wn. 
to decrease jointly according to rate and frequency of respiration ») 
the difference between the concentrations in X and A. 


dX 
dt 
Ignoring second order terms 
k.(8Y — 6X) = k,[F(X — A) sD + D(X 4) 6F + 
- k[F(X - 4) 6D + D(X A) oF 
A 


6Y 6Z sx = Cyz 6Z + cyx bX. 


= k,(Y — X) — DF(X 1) =0 
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Thus CxA + Cxy = i, 


x oc 
Cra = = 2.005¢; = .603. 
A 


Ca 


It appears from this that cyy = .397. The value of cy can be esti- 
mated from cxy = %jox/oy = .397 giving sy = .430 which while slightly 
larger is at least close to the estimate for cx = .406 as expected. 

As variable X has no simple physiological meaning, it may well 
be removed from the system. The path coefficients in the simplified 
diagram, Figure 15, (to be indicated by double primes) can easily be 


FIGURE 15. 
System oF Figure 14 Some VARIABLES REMOVED 


Same system as Figure 14 except for removal of variables X, C and U’. 
deduced directly from those of Figure 14. 


The path coefficients are 


deduced from comparison of the formulae for the same correlations. 
All but y2’, y3’, and y{’ are unchanged. 


= — = —2.201, ys’ = — yiat) = —.142, 
ys’ = ylzi/(1 — = 3.142, = — = .831. 


The following path regressions may be calculated. 


= ys = 1.000, 


A 


4 


o Tp 
Cyp = = — .00264, Cpy = dy’ — = 892, 


D Oy 


Cyr = = — .0277, Cry = 2.81. 


Y 


The shift from variable X to Y results in no appreciable changes 
in the path regressions. Figure 15, however, probably gives a better 
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representation of the physiological situation in this series of experiments. 
The variations in CO, content of the outside air A tended to produce 
directly more than three times as great a standard deviation in the 
CO, content of the alveélae Y as that actually observed (py, ='3.142) 
and the variations in depth of respiration D tended to produce directly 
more than twice the observed standard deviation (pyp = —2.201) but 
these largely cancelled each other because of the almost perfect positive 
correlation (rp, = +.979) between them, due mainly to stimulating 
effect of CO, in the arterial blood Y, in equilibrium with that in the 
alveolar air on the respiratory center and hence on depth of respiration 
D (poy = 1.072). There was also a moderately strong stimulatory 
effect on frequency of respiration (pry = +.548) but the effect of the 
* latter on CO, content of the alveolar air was practically negligible in 
this subject (pyr = —.142). 

Variations in metabolic rate from experiment to experiment tended 
to produce almost as wide variation in CO, content of the arterial blood 
as observed (pyz = .834) but nevertheless much less than the effect of 
variation in CO, of the outside air and of depth of respiration by them- 
selves. Taking the analysis at face value, increase in metabolic rate 
tended to cause a shift from response by increased depth to response 
by increased frequency of respiration but this interpretation may put 
too much weight on the small coefficients ppz = — .201, prez = + .201. 

Finally the analysis indicates that depth of respiration was affected 
little by residual factor (ppw = .28) but that the variations in frequency 
of respiration were largely unaccounted for (pry = .78). 

A similar analysis of the data from the other subject (J.G.P.) is 
not practicable for reasons already noted. There appears to have been 
an overcorrection for dead space in the lung that has left nothing but 
random variations in the estimates of alveolar CO, content C. 

A rough comparison may, however, be attempted on the basis of 
the correlations among A, D and F and borrowing from the analysis 
for the other subject. Three statistics must be borrowed to compensate 
for the Ahree missing correlations. 

We might take cy, = Pyascy/o, = 1 as one of these. This yields 
Pya = 5.59 if oy is equated to observed co, . This is considerably larger 
than that from the other subject (3.142). The observed value of o¢ 
for J.G.P. is, however, probably too low because of overcorrection. 
It may be better for purposes of comparison to borrow the path co- 
efficient 3.14 from the previous analysis. The other statistics that seem 
most appropriate for borrowing are those relating F and D to Z even 
though there is no reason to suppose that the path regressions of fre- 
quency and depth of respiration on variation in metabolism are exactly 
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A 
(4, = 13.142) 
93-2050, 
F 12 


(4, = ae) 


FIGURE 16. 
System or Ficure 15 ror Seconp SuBJEcT 
Same variables as in Figure 15 but with respect to subject J.G.P. instead of J.H.S. In order to obtain 


as nearly comparable results as possible in the absence of usable correlations involving C, the values 
of pya , Ppz and prz (in parentheses) are borrowed from Figure 15. 


the same in the two subjects or that the standardized coefficients are 
identical. They should be small and somewhat similar, however. 

Thus for the purpose of comparisons, it has seemed simplest, in 
default of clear reason to the contrary, to borrow the three path co- 
efficients, py, = 3.142, prz = +.201 and ppz = —.201. The results 
are shown in Figure 16. They bring out systematically the great 
difference between the subjects implied by the difference in correlations 
Tp, and rr, . Alveolar CO, content Y seems to have been controlled 
about twice as much by frequency of respiration as by depth in the 
case of J.G.P. in contrast with virtually exclusive control by depth 
in the case of J.S.H. 


STANDARD ERRORS 


We have not discussed standard errors. The validity of a diagram 
of relations rests on grounds to which standard errors can not be applied. 
The standard errors of the path coefficients or path regressions depend 
on the diagrams. It must be recognized that these coefficients do not 
belong in the same category as such statistics as ordinary partial regres- 
sions that are valid irrespective of any causal interpretation (cf. Wright 
[1934]). Path analysis is an extension of the usual verbal interpretation 
of statistics not of the statistics themselves. It is usually easy to give a 
plausible interpretation of any significant statistic taken by itself. The 
purpose of path analysis is to determine whether a proposed set of 
interpretations is consistent throughout. 


SUMMARY 


The application of path analysis to reciprocally interacting variables, 
with or without lag between action and reaction, is considered in some 
detail. The data of Haldane and Priestley on homeostatic control of 
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respiration, introduced recently in this connection by Turner and 
Stevens, are used to illustrate the process of exploring different points 
of view which is one of the most useful features of path analysis. 
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AN EXTENSION OF A TRUNCATED POISSON 
DISTRIBUTION* 


A. Cuirrorp CoHEN, JR. 


The University of Georgia, 
Athens, Georgia, U.S. A. 


1, INTRODUCTION 


When random numbers of organisms are deposited at randomly 
selected sites following which random migration of individual organisms 
may occur between sites, the number of organisms to be observed at 
a site is often found to be distributed in accordance with one of the 
various contagious distributions studied by Neyman [6], Beall [1], 
Feller [4], Beall and Rescia [2], Gurland [5] and others. When migra- 
tion between sites does not occur as for example in the case of eggs 
which are immobile, the true contagious distributions are not always 
applicable. Even when applicable, estimation of population parameters 
and subsequent calculation of expected frequencies in contagious distri- 
butions are likely to be tedious and time consuming. In particular, 
maximum likelihood estimating equations for parameters of these 
distributions are often more or less intractable. 

In this paper, we consider a simple two-parameter extension of a 
truncated Poisson distribution for use in the case mentioned above 
wherein no migration of organisms takes place between sites. Maxi- 
mum likelihood estimating equations for the parameters of this distri- 
bution are derived in a form which permits the ready calculation of 
estimates from sample data, and these results are illustrated with a 
practical example taken from Beall and Rescia [2]. Asymptotic vari- 
ances of sample estimates are also obtained. 


2. DERIVATIONS 


Although the distribution which we wish to consider might be 
studied from a strictly abstract point of view, it seems desirable to 
keep in mind the.specific practical example which prompted this in- 


*Sponsored by the Office of Ordnance Research, U. S. Army. 
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vestigation. Accordingly, let us consider, say, the distribution of 
organisms among colony sites under circumstances such that no migra- 
tion occurs between colonies. Suppose that possible sites are distinct 
and countable, and that each has equal probability of being selected. 
Let @ designate the probability that any particular site is selected. 
Suppose that once a site is selected, one or more organisms are de- 
posited there, and that this number is a truncated Poisson random 
variable with missing zero class. Now suppose that a colony is ob- 
served at random and a count is made of the organisms present. The 
probability function of a random variable x, the number of organisms 
present, is 


— , z=1,2,3,---, 


(1) 


where \ > 0 and 0 <6< 1. When @ = 0 we have the trivial case 
in which only zeros occur. When @ = 1, we have the simple truncated 
Poisson distribution from which the zero class is missing. 

Consider a sample consisting of N observations of random variable 
x with probability function (1). The likelihood function is written as 


P(a, , ty 3 8,d) = (1 — il —e 2, !, (2) 


where n, is the number of sample observations for which z = 0, and 
n* = N — n,, the number of sample observations for which x = 1, 
2, --- . Taking logarithms of (2), differentiating with respect to @ 
and X in turn, and equating to zero, we obtain estimating equatiors 


= (n*/6) — [n/(l — = 0, 
aL/ar= 
where L is written for In P. 
The required estimates, which we designate 6 and \ to distinguish 


them from @ and X, the parameters estimated, are roots of these equa- 
tions. Solving the first equation of (3), we have 


6 = n*/N. (4) 
From the second equation of (3) we obtain 
&* = d/(1 — (5) 


where @* = )-"*, z,/n*, is the mean of the n* non-zero sample obser 
vations. It may be recognized that equation (5) is the maximum 
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likelihood estimating equation for a sample of n* observations from 
a truncated Poisson distribution from which the zero class is missing 
(cf., for example, reference [3]). The required estimate { is found 
by solving (5). With z* entered for z, \ is easily obtained by inter- 
polating linearly in Table 1 of [3] or by reading directly from the chart 
of Figure 1 in the same reference, which conveniently appears on 
page 206 of the current volume of this journal. 


3. ESTIMATE VARIANCES 


The asymptotic variance-covariance matrix of (6, \) is obtained 
by inverting the matrix whose elements are negatives of expected values 
of the second-order derivatives of L. Further differentiation of (3) 
gives 


aL /ae = —(n*/6*) — [no/(L — 
= 0 = &L/ad 0, (6) 
= —n*{(e*/d*) — [e*/(1 — 
It follows that 
V(6) ~ — 6)/N, and VO) ~ D/E@*)¥Q), (7) 


Il 


where the expected value E(n*) = N@, and 


wa) = (1 — e)*/[L — A+ (8) 


The asymptotic covariance is zero and thus 6 and i are asymptotically 
independent. It was pointed out in [3] that (A) is continuous and 
monotonic decreasing, that lim,.. y(A) = 2, lim)... ¥(A) = 1. There- 
fore, regardless of the value of \, the asymptotic variance of 4 satisfies 
the inequality \/N@ < V(K) < 2A/N@. To facilitate the calculation 
of V(A), a table of ¥(A) was provided in [3]. 


4. AN ILLUSTRATIVE EXAMPLE 


To illustrate the results of this paper, a sample of the number of 
European corn-borers, Pyrausia nubilalis Hubn. on small unit areas 
of a field as observed in 1937, was selected from Beall and Rescia [2]. 
Following is a tabulation of the observed data along with expected 
frequencies calculated using results of the present paper and, for com- 
parison, expected frequencies based on a complete Poisson distribution 
and on five different contagious distributions considered by Beall and 
Rescia. The n’s in the expected frequency column headings represent 
various values assigned to a constant which appears in a generalization 
of Neyman’s contagious distributions proposed by Beall and Rescia 
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TABLE 1 
OBSERVED AND EXPECTED FREQUENCIES OF Pyrausta nubilalis in 1937. 


Expected Frequencies, fz 


*Fit as given in Beall and Rescia [2]. 


[2]. When n = 0, 1, and 2 respectively, the generalized function becomes 
Neyman’s Type A, B, and C distribution in turn. 

Summarizing the data for this example, we have N = 56, ny = 33, 
n* = 23, and assuming that the observation which Beall and Rescia 
reported as 5+ was actually 5, * = 42/23 = 1.8261. From (4) we 
have 6 = 23/56 = 0.4107, and from (5), @* = 1.8261 = A/(1 — e””). 
Linear interpolation in Table 1 of [3] gives } = 1.355. 

The expected frequencies shown in the column at the right of Table 1 
were computed with estimates 0.4107 and 1.355 substituted in (1) for 
6 and X respectively. Expected frequencies for the complete Poisson 
distribution were computed using p(z; \) = e *A*/x! with the estimate 
\ = = = 42/56 = 0.75 substituted for \. The superiority of the 
truncated Poisson distribution-as modified here, in fitting the observed 
distribution of this example is clearly demonstrated. Since corn- 
borers are moving organisms, it is of course possible for them to migrate 
to some extenyfrom one site to another. However, since the observed 
distributi6n was not satisfactorily fitted by any of the contagious 
distributions tried, and since the truncated Poisson distribution gave 
such a close fit, it seems reasonable to suggest that perhaps no migra- 


tion actually occurred between the sites considered in collecting these 
data. 


a 
No. Obs. Com- Trun- ; 4 
Insects| Freq. | n=O] plete cated 
x fo (*) (*) (*) (*) Poisson | Poisson 
0 33 | 37.8 | 37.1 | 36.8 | 36.1 | 35.7 | 26.5 33 be 
1 12 5.6 6.8 7.3 | 8.3 8.8 | 19.8 10.8 ee 
2 6 5.2 5.0 5.0 | 5.0 5.1 7.4 7.4 a 
3 ‘3 3.5. | 3.2 3.1 | 2.9 2.9 1.9 3.3 oT ee ag 
4 1) | 1.9 | 19 | 18] 1.7] 16 3 1.1 Sok 
5+ 1 2.0 2.0 2.0 | 2.0 1.9 | 4 pe 
x? 9.04 | 5.57 | 4.47] 2.90] 2.17] 11.95 0.59 rae: 
df. 2 2 2 2 2 
064] .11| .24 .34 .008 .75 
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Asymptotic variances of 6 and \ are computed from (7) with esti- 
mates 0.4107 and 1.355 substituted for 6 and \ respectively. By linear 
interpolation in Table 2 of [3], we obtain ¥(1.355) = 1.39. Accordingly, 
we calculate V(6) = 0.4107(0.5893)/56 = 0.0043, and, since E(n*) 9.5 = 
N6 = n*, V(A) = [1.355/23] 1.39 = 0.082. 
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OPTIMUM ALLOCATION IN REGRESSION EXPERIMENTS 
WITH TWO COMPONENTS OF ERROR 


Rosert M. DeBaun Anp Victor CHEw! 


American Cyanamid Company, 
Stamford, Connecticut, U. S. A. 


0. Summary 


In the literature, experimental designs for regression analysis are 
usually optimized with respect to the total number of observations 
to be taken. In this paper, cost functions are introduced and optimum 
designs are derived for both extrapolation and interpolation, including 
the case where “split plotting’’ occurs. 


1. Introduction’ 


In many experiments, there are two components of error. Thus, 
an observation may be expressed as the sum of treatment effects and 
two random variables as in (J). 


Yi; =f b; + Wi; (1) 
where Y,,; is the value of the observation, ¢ is the grand mean plus 
treatment effects, b; is the observation of a random variable, with 
zero expectation and variance o; , taking the value ; for all observations 
in the ith group (plot), i.e., distributed independently only for obser- 
vations from different plots; and w,,; is the observation of a random 
variable,- with zero expectation and variance o2 , taking the value 
w,; only for the jth observation in the ith plot, and thus distributed 
independently for all observations. Experiments giving rise to such 
data are usually called split plot experiments. The treatment com- 
binations are often assigned to the sub-plots (experimental units) so 
that the error effects are orthogonally subdivided among the treatment 
effects, as in Table I below. 

The effect (B) has an error variance of (203 + o2), while the effects 
(A) and (AB) are associated with the within-plot error only. We 
have thus enhanced the precision of the estimated (A) and (AB) 


1Present Addreas, Naval Weapons Research Laboratory, Dahlgren, Virginia, U.S.A. 
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TABLE I 
TypicaL Spiit-PLot EXPERIMENT 


Treatment Combination Plot Observation 


(1) Yu 
(a) Yie 
(b) Yu 
(ab) Ya 


effects at the expense of effect (B). In many agricultural experiments, 
split-plotting may be only a convenience. In the physical sciences, 
on the other hand, it is often a necessary course of behavior. For 
example, a replication in full plots may be very expensive, whilst 
replication within plots may be relatively cheap, as when aliquots of 
experimental material are the within-plot elements and whole lots of 
experimental material represent the whole plots. Clearly, as the rela- 
tive magnitude of between- and within-plot errors changes, as well 
as the incremental cost of additional within-plot replications, varying 
experimental strategies will be counselled. This general subject has 
been studied extensively in terms of the usual experimental designs 
(factorial experiments, Latin Squares, etc.). We propose to discuss 
here the problem of regression experiments, particularly those in which 
extrapolation is the principal purpose. 

In many chemical experiments, the method of extrapolation to in- 
jinite diliuion is used. Thus, in determining the physical properties 
of polymeric substances, viscosity measurements are made at varying 
concentrations of a solute in a solvent medium. When viscosities 
(relative to those of the solvent) are plotted against concentrations, 
the molecular weight of the solute is determined by the limiting slope 
of relative viscosity on concentrations. (Figures Ia and Ib). For 
example, if the graph in Figure Ia can be approximated by a quadratic 
expression in 2, then the intercept of Figure Ib reflects the limiting 
slope (i.e. the particle weight of the solute), while the slope of Figure 
Ib may give useful information on the shape (as opposed to the weight) 
of the dispersed solute particles. 

Also, again with polymeric substances, a function of the amount 
of light scattered by the solution when impinged by a beam of mono- 
chromatic light (Rayleigh ratio) is measured at varying solute con- 
centrations and angles around the solution. The Rayleigh ratio is 
expanded, in a Taylor series, in terms of concentration and squared 
sines of half-angles of scattering. The zero concentration and zero 
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x 


FIGURE Ia 
Grapa or Viscosity Y AGAINST CONCENTRATION X 


FIGURE Ib 
Grapu oF Repucep Viscosrry Y/X AGAINsT CONCENTRATION X 


angle intercept estimates the reciprocal of the solute particle weight, 
and the limiting slopes of the Rayleigh ratio on angle and concentration 
are also of considerable physical interest. 

In the light-scattering experiment, for instance, the exigencies of 
the experimental technique are such that it is imperative that the 
experimenter measure at as few concentrations as possible. Thus a 
carefully prepared solution of known concentration is placed in the 
apparatus and light scattered therefrom is measured at a series of 
angles. To prepare a solution of another concentration may require 
5 to 100 times as much work as to read an already prepared solution 
at another angle. Similar situations arise in other fields of applications. 
In animal husbandry experiments, for example, often cows form the 
main plots and treatments are applied ‘to the different parts of the 
animal in injection and palatability tests, or given to the animal over 
periods of time in feeding trials. Introduction of another sub-plot 
treatment is clearly much less expensive than that of a whole plot 
treatment. 

Where split-plotting arises naturally, it is the duty of the statistician 
to recognize this fact, make allowance for it in the interpretation of 
experimental data and, if possible, to take advantage of it. In what 
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follows, we treat the problem of such experiments both in the case 
of fully independent errors of observations and in the split-plot case. 
We take into account the incremental cost of replications within plots 
and seek experimental designs which provide maximum information 
for the total cost of the work, as opposed to maximum information 
for a given number of observations. — 


2. Basic Notions 


We consider experiments of N observations, indexed serially by 
1, --- , N, and assume that the observations follow model (1). The 
errors of observations together with their expectations, variances and 
covariances are given in (2) to (4), 


Y, + wy; = (2) 
= 0 (3) 
=o, to., t=s and j=t, 
and j (4) 
= 0, 


o3 and 2 being the between-plot and within-plot variances, respectively. 
Accordingly, the variance-covariance matrix of observational errors 
can be represented by a matrix V of order N X N, whose diagonal 
elements are (c; + o2) and whose off-diagonal elements are either 
o, or zero, according as whether the pair of observations involved are 
from the same or different plots. We can make the diagonal elements 
of V unity by dividing throughout by (of + 02). 

In this way all variances given throughout the rest of this paper 
are in units of V, = of + 2. In the cases where the observations are 
independent, then of course of = Oand V, = o2. Similarly, information 
will be given in units of V>*. It will do no harm in what follows to 
take V, = 1 and assume it understood where variance appears. 

If the values taken by the independent variables X, (including 
perhaps powers and products thereof) in the experiment are denoted 
by a matrix X (af order N xX k, if there are k independent variables) 
and the observed values by a column vector Y, then according to least 
squares theory (Aitken, [1935]) the minimum variance estimates of 
the column vector B of the k regression coefficients are given by (5), 


B = (X/V"X)'X'V"Y, (5) 


where V is the error variance-covariance matrix. Equation (6) gives 
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the predicted response at x. The variance of the ‘rue response at x 
is given by (7). 


Y(x) = B’x (6) 
Var Y(x) = (7) 


If we assume that the cost of making an independent observation 
is unity and the incremental. cost of taking another observation in 
the same plot is w, then the total cost of the experiment is given by 


W= G@ — dv] (8) 


where G, is the number of sub-plots in the 7-th plot. If we define 
information as the reciprocal of the variance and efficiency as informa- 
tion per unit cost, we will wish in general to maximize (for useful 
values of x) the efficiency given by 


E = W" Var" Y(z). (9) 


Efficiency is thus given as information (i.e., reciprocal variance) 
per cost unit (i.e., per cost of a single main plot unit.) For most experi- 
mental alternatives, the person who chooses the experimental plan will 
be satisfied to consider only the ratio, as we do here. In special cases, 
however, where the experimental alternatives differ drastically in type, 
the designer might wish to look also at the total work involved. 


3. Estimation of a Mean 


In estimating a mean, X is an N X 1 column vector of one’s. If 
all observations are independent, clearly KE = 1. With equally sized 
plots, each containing G observations, we can obtain V"' easily by 
partitioning V into a diagonal matrix of order N/G xX N/G, whose 
diagonal elements are matrices A of order G X G given by a The 
inverse of A is given by (11). 


1 r uu eee v 
roses 9 ¥& 

= + 02), - 
1+ (G — 2)r 


The 


The matrix V~' is then obtained from V by replacing A by A’. 
efficiency is easily shown to be as in (14). 
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E, = W Var" (Y) = W-(X’V"X) 
aT G N 
= G/1+ G@— + — 1)’rv). 
Group size for maximum efficiency will be obtained by differentiating 


(14) with respect to G, setting the derivative to zero.and solving for G. 
This optimum group size is given by 
1/2 


r rw 


Numerical values of the optimum plot size for selected values of r and 
w are given in Table II, together with the maximum efficiency obtained 
by substituting (15) into equation (14). Optimum plot size and maxi- 
mum efficiency increase as w and r decrease. (The value r = 0 corre- 
sponds to the case of independent errors). Efficiency is high except 
for combinations of large values of r and w. The sensitivity of the 
efficiency function (14) to variations in plot size G is illustrated in 
Table III, where the efficiencies have been calculated for plot sizes 
differing by +50 percent of the optimum; that is, for the nearest integral 
values of 1/2 G (optimum) and 3/2 G (optimum). The efficiency 
function is thus relatively insensitive to variations in plot size in the 
neighborhood of the optimum. 

Although similar work has been covered in other connections 
(Marcuse, [1949]), the treatment here is basic to what follows and 
perhaps the form of presentation is interesting in itself. 


4. Regression On One Independent Variable—Independent Observations 
In order to lay the groundwork for what follows, we first treat extra- 
polation in simple linear regressicn problems where the errors are all 


TABLE II 


Optimum Pior SizE anp Maximum Erriciency (IN PARENTHESIS) 
r = o3*/(os? + ee*) and w = proportional cost of taking an additional sub-plot observation. 


r w:1/100 1/75 1/50 1/25 1/5 

5 | 29.8(5.96) | 25.8(5.57) | 21.0(5.25) 14.7(4.01) | 6.0(2.00) 
3 15.2(2.53) | 13.1(2.49) 10.7(2.29) 7.5(2.02) | 3.0(1.34) 
5 9.9(1.67) 8.6(1.63) 7.0(1.56) 4.9(1.44) | 2.0(1.11) 
6.5(1.27) 5.6(1.25) 4.6(1.22) 3.2(1.07) | 1.3(1.02) 
9 3.3(1.05) 2.9(1.05) 2.3(1.03) 1.6(1.00) | 0.7(1.00) 


ong 
d 
ite 
ah 
! 


OPTIMUM ALLOCATION IN REGRESSION 457 


TABLE III 


EFFICIENCY FoR PLoT S1zE 
G = 1/2 G(Optimum) and G = 3/2 G(Optimum) (In Parenthesis). 


r w:1/100 1/50 1/25 

3 2.41(2.49) 2.06(2.24) 1.88(1.96) 

5 1.60(1.64) 1.51(1.53) 1.28(1.41) 

1.22(1.26) 1.16(1.20) 1.13(1.14) 
independent. Let p observations be made at x = —1 and q atx = 1. 


Then the total cost is W = p + q and the variance of the true response 
at x is as given in 
Var Y(z) = + + — ' (16) 


The experimental design is determined by @ = q/(p + q). The effi- 
ciency is expressible in terms of 6 as in (17). For maximum efficiency, 
6 satisfies (18) for interpolation and satisfies (18a) for extrapolation. 


E = 40(1 — + — 20) + (17) 
= 1/2(1 + 2), | x | 1 (18) 
1/2(1 4 1) (18a) 


Equation (18a) enables us to design the most efficient experiment 
for extrapolation at any given point. If prediction is desired over 
an interval x, to x, which does not overlap the interval x + 1, a minimax 
procedure can be used by setting 0 = O.,:imum for x, or t, , whichever 
is greater in absolute value. 


5. Regression On One Independent Variable—Non-Independent Observa- 
tions 


If we imagine an experiment in one plot of (q + p) sub-plot obser- 
vations, then the variance of the predicted response at the point x 
will be given by (19). 


Var Y(z) = r+ + p”) 


(19) 
The re- 


Total work in this experiment is W = 1 + (@ — 1)w. 
ciprocal efficiency P, is given in (20) and (21). 
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P, = = W Var Y(z) = [1+ — (20) 
F, = [1 + 2z(1 — 26) + 27]/40(1 — 06) (21) 


The most efficient experiment is obtained by minimizing P, . The 
solutions of P,/G = 0 and dP,/d@ = 0 are given by (22) and (23). 
Combining (22), and (23), we get (24). 


1 1 1 1/2 
= 1/2(1 x), | | < 1, (23) 
= 1/21 +4), [2|>1. 
1 1 1 1/2 
Come = (1-2-2544) <1, 
1 1 1 1/2 
= {1-1-1 41) » tel21, 


which give optimum plot sizes for interpolation and extrapolation, 
respectively. The optimum plot size for interpolation is identical 
with (15). 


6. Bilinear Regresston—N on-Independent Errors 
Suppose we have a bilinear expression (25) to be fitted, where 
y= bo + bit, + bore . (25) 


Z, is the between plots variable (q points at +1 and p points at —1), 
and z, is the within plots variable (m points at +1 and n points at —1). 
The matrix of sums of squares and products of the independent variables 
can be shown to be as given in (26), or as in (27) where the letters are 
as in (28). The inverse of (27) is given in (29). 


(m+nq+p)r 


1+(m+n-—1)r 
1 m—-n | 
q+p m+n 
(q — — n) (26) 
gq+p (q + p)(m + n) 


m—n (gq — p)(m — n) 1 + 4mnr 
(qtp(m+n) (1 —1r)(m+n)] 
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tse 
GH 
(27) 
C BC D 


G=m+n, H=q+p, 
g¢=m/(m+n), B= 20-1, C= 2-1, 


(28) 
p = Lt — ¢) 1) _ (1 — + 
l-r l-r 
1 + 1) 
— BC —B 
(D-C’)1-B) 1-B D-@C 
1 
i-P 
—C 1 
| 0 D-C. 


The variance of the predicted value at a point (z, , r2) is given by 
(30), where F, is as in (31). 
Var Y(x) = [1 + r(G — 1)F.]/GH, (30) 


D- BC 


b-c* 


1 1 


2 (31) 
_ _ (l—n(a — CP (x, — By 
"G-r+G0 -@* 1-8 


The total work involved is given by (32) and the reciprocal effi- 
ciency by (33). 


W = A{1 + wG — })) (32) 
Py = Bj! = WVar Yq) = MG 1) + 


The design of the experiment is determined by 6 and ¢, or equivalently 
by B and C, and the plot size is specified by G. For maximum efficiency, 
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we require P, to be a minimum. If the design is fixed, the optimum 
plot size G satisfies (34), where the coefficients are given in (35). 


G* + + + cG+d=0 (34) 


"an Vai — 2Bz, + 1) 


If the plot size is fixed, the optimum design is given by (36) 
(1/2)(1 + 21) and Povtimum (1/2)(1 + 2) 1 
or 1,2 (36) 


= and ™ Zs 


6. 1/2(1 + 1/z,) and Poptimum 1/2(1 + 1/22) > 1 


or 


which is identical with (23). Combining equations (34), (35) and (36), 
we obtain optimum plot sizes for interpolating and extrapolating 2, 
and z, as given in Table IV. As long as we interpolate for the within 
plot variable, the optimum plot size is identical with that for the esti- 
mation of the mean (15). 


i=1,2 


TABLE IV 
Optimum Pot Sizes FoR INTERPOLATION AND EXTRAPOLATION OF 2 AND 22 


z, (Between Plots) 


(within) Interpolation Extrapolation 
Interpolation k =(1 — 1/r — 1/w + 1/rw)'/2 k 
Extrapolation + — | | 


| 
a 
4 
| 
i 
| ) 
; 
7 ‘ 
i 
- on 
Ay 


OPTIMUM ALLOCATION IN REGRESSION 


7. Numerical Example 


The following data (Table V) came from a light-scattering experi- 
ment’. The response y is proportional to the Rayleigh ratio, which 
is a function of the amount of light scattered by the solution. The 
within-plot variable x. is recorded as sin’ 6/2, where @ is the angle of 
incidence of the impinging monochromatic light. Only the six smallest 
angles and the five lowest concentrations have been included in Table V; 
otherwise the regression of y on x, would not be linear and the indi- 
vidual within concentration regressions of y on x, would have signifi- 
cantly different slopes. 


TABLE V 


Data FROM A Licut-SCATTERING EXPERIMENT— 
THE RESPONSE IS PROPORTIONAL TO THE RAYLEIGH Ratio x 


Angles (x = sin? 9/2 X 10) 
0: 32, 35, 38, 40, 45, 50 
22: 0.7598, 0.9043, 1.0600, 1. 1698, 1.4644, 1.7861 
0.686 0.5591 0.6294 0.6943 0.71% 0.8178 0.9554 
0.874 0.5304 0.5831 0.6501 0.7030 0.8045 0.9394 ee 
1.060 0.5665 0.6183 0.7269 0.7309 0.8374 0.9429 ni 
1.290 0.6047 _ 0.6549 0.7068 0.7838 0.8858 0.9786 eS 
1.470 0.6154 0.6800 0.7320 0.7720 0.8979 1.0076 


2 is concentration in grams of polymer per 10 liters of solution. 


The analysis of variance is given in ‘able VI, giving estimates . 
of 0.000179, 0.000325 and 0.64 for o2 , o} and r, respectively. Equation . fe 
(13) gives u = 2.81 and v = —0.52. ‘These values determine the ey 
inverse of the variance-covariance matrix V~‘ of the errors of observa- p 
tions, and substitution in equation (5) gives the following regression 
equation. 
Y = 0.1885 + 0.09892, + 0.37802, . (37) 


The predicted response at x, = 0 = 2, is 0.1885 with a variance of 
2.6518, obtained from equation (7). In this experiment, a reasonable 
value of w, the proportional additional cost of taking another sub-plot Py) oie 
reading, is .02. The total cost of obtaining the data in Table V is, Sah en 
therefore, 5.5 units. From equation (9), the efficiency of the experi- . 
ment for estimating the response at x, = 0 = 2, is 1/(5.5)(2.6518) = 
.0398. The low efficiency is due to the large variance of a predicted 
value at an extrapolated point. 


?Data supplied by Dr. G. F. Sheats, 
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TABLE VI 
ANALYsIS OF VARIANCE OF Data IN TABLE V 


Expected 
Source MS. Mean Squares 


Between Concentrations 
Linear Regressicn on Concentrations 
Deviations from regression 
Within Concentrations 
Average Linear Regression on Angles 


0.002129 | +60? 


Bl Rome 
o 


Differences among Individual Regressions 0.000204 
Residual 0.000179 
TOTAL 


It might be of interest to compare the above efficiency with that 
of the optimum using the experimental design derived in this paper. 
With r = 0.7 and w = .02, 

(1 -}-143) 
Coding z, and z, as in (38), 
xz, = (x, — 1.078)/.392; 22 = (x2 — 1.273)/0.513, (38) 
so as to make their experimental values range from —1 to +1, the 
equation z, = 0 = z, corresponds to z, = —2.75 and z, — 2.48. From 
Table IV, the integer nearest to the optimum plot size G for extra- 
polating both z, and z, is 4.6 (2.75° + 2.48’ — 1)'”/2.75 = 6 and 
substitution of this value in equation (33) gives an efficiency of 0.1533. 
This is still low because of the extrapolation but it is high relative to 
that of the actual experiment. The efficiency of the experiment relative 
to the attainable optimum is 26.0%; however, the actual experirhent 
provided a test for non-linearity of the regression. . 
Of course, the values of r and of w used here are themselves only 


estimates, especially r, and changes in these might alter somewhat the 
remarks relating to efficiency. 


8. Conclusion 


This paper can be extended in several directions. Quadratic re- 
gression is an obvious extension, both for the simple linear and the 
multilinear case. The theory can also be extended to the case of the 
most efficient equally spaced design, and applications can also be made 
to response surface experiments, for example, with respect to rotatability. 
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AN EXPOSITORY NOTE ON ESTIMATION OF 
SIMULTANEOUS STRUCTURAL EQUATIONS 


R. L. BasMAnn 
Hanford Laboratories Operation, 
General Electric Company 
Richland, Washington, U.S. A. 


1. Introduction 


In a recent article appearing in this journal Turner and Stevens 
[1959] have presented an excellent discussion of regression analysis of 
causal paths based on the early work of Sewall Wright and others. 
The authors note that considerable work on the treatment of multiple 
equations has also been done by econometricians. (See the papers 
by Haavelmo [1943, 1944], Koopmans and Reiers¢l 1950].) The purpose 
of the present note is to make known to biologists and biometricians 
several methods of generalized linear estimation of structural parameters 
not yet widely known outside of econometrics, in particular the leas#- 
variance difference method. The latter method, in two different forms, 
has been introduced into the econometric literature by Theil [1953] 
and the present author [1957]; but the principle on which it is based 
and, indeed, the mathematical technique itself have been known for 
some considerable time in the Theory of Errors, the principle having 
been laid down and the technique fully adumbrated by Gauss [1821- 
23]. This principle, in Gauss’s words, asserts that ‘“The most plausible 
value of a quantity that is a given function of the unknown quantities 
(i.e. the parameters) of the problem is found by one’s substituting 
for the latter their most plausible values as determined by the Method 
of Least Squares.’”’ (Gauss [1823], pp. 34ff., pp. 102-3.) This principle 
foreshadows the well-known modern theorem about the invariance of 
maximum-likelihood estimators. 

The technique of least-variance difference estimation can be formu- 
lated exactly as Aitken’s [1935] generalization of Gauss’s method of . 
treating independent observations of unequal precision (Gauss [1809], 
p. 214) to the case of interdependent observations. 

The plan of this note is as follows: 

Section 2 contains a discussion of some logical features of causal 
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path models of the type presented by Turner and Stevens. Section 3 
displays three methods of estimation of parameters in such models 
within a unified framework of statistical theory. Section 4 describes 
the heuristic considerations that led to the present author’s initial 
formulation of the least-variance difference method. These considera- 
tions were inspired by the classical Gauss-Fisher least-squares problem 
as interpreted by Whittaker and Robinson ({1924] pp. 209-259), Wold 
({1953] pp. 187-214), Kempthorne ({1952] pp. 38-67). As this formu- 
lation involves nothing more than a slight generalization of the problem 
that motivated the Gauss-Legendre formulation of the method of least- 
squares itself, it could serve as an excellent introduction to the broader 
subject of this article. Since, however, the unity of all three methods 
described here can be better illustrated in terms of the somewhat less 
fundamental principle already mentioned, the latter approach is empha- 
sized. Section 5 lists some known statistical properties of the estimators 
described in Section 3. Section 6 contains a short numerical example 
of the least-variance difference method. 


2. Some Features of a Simple Model 


Let us consider a simple model like one of those discussed by Turner 
and Stevens. Diagrammatically, the hypothesized causal paths are 
exhibited by 


&4 


where, as in the Turner-Stevens article, we think of 9, , &, , && 
being conceptually “true” variables of which we have no direct exact 
measurement. We suppose, however, that errors in the observed 


variables x, , --+ , x, are negligible, so that x, = & ,k = 1, +++, 4. 
We can express the model functionally in two equivalent ways: Since 

~ no causal paths end at any & ,k = 1, --- , 4, we have 


which we call the systematic reduced-form or reduced regression equations. 
We can also express the model in its sfructural form 
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$2(m &,). (2.4) 

We must have 
= fi) (2.5) 


i in » » 
oa = fa (2.6) 


For simplicity let us hypothesize that the structural equations (2.3) 
and (2.4) are linear in the unknown parameters, and we write, in- 
troducing observed z’s for the ¢’s, the following structural model: 


m = Biome + + Yi2%2 + Vis (2.7) 
= Bam + Vests + + (2.8) 
from which it follows that 
= + + + + Tis (2.9) 
= + + + + Tos (2.10) 
provided 
| 29, (2.11) 
Ba —1 


Since, for example, 


Too Ba —1 Yo ‘Y22 
it follows that if the rank of Ty , where 


Ya 
is exactly r, = 1; then the rank of Ij , where 
Fee 
is exactly p, = 1, and conversely. Since the causal path hypothesis 
expressed by (2.7) and (2.8) specifies that y,; = 0,714 = 0, that 7,, ~ 0 


and Vi2 # 0, a1 = 0, 722 = 0 and that Y23 ~ 0 and Yu = 0, it follows 
that the rank of II is exactly p, = 1 and that the rank of Ij , where 


= 


4 
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is exactly p,, = 1. (It must be kept in mind, however, that the con- 
verse is not true; p, = 1 does not imply that any y,, = 0, h = 1, 2; 
k = 1,2. This means that, in principle, we can test only hypotheses 
about the rank of matrices like IIf , 1, and not about specifications 
like y:3 = 0 or y23 = 0. We obtain information about the latter only 
when we reject the whole class of specifications leading to the hypoth- 
esis p, = 1 as against p, > 1.) 


3. Estimation of Structural Parameters 
We now introduce the random variables «, , e, defined by 
' (3.1) 
€2 = Yo — Ne (3.2) 


(where y, and y, denote observed values of , and 7, respectively) 
and we shall suppose ¢, and ¢, to have the following properties: 


t«=1,2, (3.3) 
E(es:€:,) = ox; = const. h = 1,2, (3.4) 

and 
= 0, (3.5) 


t’,¢ = 1, 2, --- N observations. 
From (2.7) and (2.8), (3.1) and (3.2) we have 


Yr = Biome + + + + (3.6) 
and 
Y2 = + Yeats + + Y25 + (3.7) 
from (2.9) and (2.10) we have 
Yr = + + Mis + (3.8) 
and 
Yo = + + Wests + Wale + Ts + (3.9) 


By application of the principle laid down by Gauss ({1821], pp. 
34ff.), the least-squares estimates of Bi. , ¥1: , Yi2 » Yis are obtained 
from the normal equations corresponding to (3.6) by substitution 
therein of the least-squares estimates of 7,2. ,# = 1, --- , N obtained 
from the estimation of (3.8) and (3.9) Similarly, the least-squares 
estimates of B2: , Y23 » Y24 » Y2s are Obtained from the normal equations 
corresponding to (3.7) by substitution therein of the least-squares 
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estimates of n,, ,¢ = 1, --- , N obtained from the estimation of (3.8) 
and (3.9). 

[At this point it is desirable to emphasize that classical (probabilistic) 
least squares theory in no way sanctions the substitution of the direct 
observation (say) y:2 for m2 in the normal equations corresponding 
to (3.6) in the causal path model shown above. Some econometricians 
have followed this procedure, calling the numbers obtained by the 
name “straight-forward least-squares estimates,’ or even “classical 
least-squares estimates,’ but this practice is misleading and has been 
the source of much fruitless controversy in econometrics.] 

There are several ways to obtain least-squares estimates of 7,, and 
N:2 - These differ only in the degree to which estimates of the coeffi- 
cients 7,, and o,; ,h,7 = 1, 2;k = 1, --- , 4 are constrained to satisfy 
the hypothetical restrictions deduced from the structural model. We 
can, for instance, in estimating the coefficients in IIj and Ij, con- 
strain the estimates and to have ranks = = 1, as deduced 
from (2.7) and (2.8). Let the joint frequency function of ¢,, and €,. 
be the bivariate normal and be non-singular. From this distribution 
a likelihood function can be derived based on the sample observations. 
Let L be the natural logarithm of this likelihood function. Since the 
model (2.7), (2.8) implies 


— = O (3.10) 
and 
— = O, (3.11) 


the reduced-from parameters 7, , o,; , h, 7 = 1,2,k = 1, --- , 5, can 
be estimated by maximizing 


G = L+ — + — (3.12) 


where X, and \, are Lagrangean undetermined multipliers. The resulting 
estimates of 4. , on; , are called full-information with respect to the 
model. The (constrained) maximum-likelihood estimates of 7,, and 
» aNd , Obtained by the application of the Gauss principle 
just alluded to, are then substituted for 7,, and m2 in the normal 
equations corresponding to (3.6) and (3.7) as described above. The 
resulting estimates 6,, , 71: , etc. are called full-information estimates 
of structural parameters. It should be emphasized that the estimation 
process just described is conceptual only; the computational process, 
however, is equivalent to this conceptual substitution procedure. The 
same is true of leasi-variance ratio estimation and least-variance dif- 
ference estimation described below. (Cf. Koopmans and Hood [1953], 
pp. 159 ff; also Chernoff and Divinsky [1953], pp. 252-258.) 
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A second variant of the classical least-squares method is to impose 
on maximum-likelihood estimation of the z,, in (3.8) and (3.9) only 
a part of the constraints deduced from (3.6) and (3.7). For instance, 
in estimating (3.6) we use as the maximum-likelihood estimate of 
ne (call this 4,2) the value obtained from maximum-likelihood esti- 
mation of (3.8) and (3.9) constrained only by the restriction A, (ot 
fly) = 1, ie. by (3.10) only. This procedure is called the limited- 
information single equation method in econometrics. (Cf. Anderson and 
Rubin [1953]; also Koopmans and Hood, [1953], pp. 166 ff.) 

A third variant is simply to use in estimating equation (3.6) the 
unconstrained maximum-likelihood estimate of 7,2 (call this 4%) ob- 
tained by unrestricted least-squares estimation of (3.9); and in esti- 
mating (3.7), the estimate of 7,, (call this 74) obtained by unrestricted 
least-squares estimation of (3.8). This method, due independently 
to Theil and the present author, has since come to be known in eco- 
nometrics as two-stage least-squares, a poor nomenclature, since the 
full-information and limited information methods are two-stage least- 
squares in exactly the same sense. A more descriptive nomenclature 
would be to refer to the third variant as least-variance difference esti- 
mators (cf. Basmann [1959a]), which distinguishes them from the 
second-variant estimators, already called least-variance ratio estimators. 
(Cf. Koopmans and Hood [1953], pp. 168-169.) 


4. Alternative Formulation of Least-Variance Difference Estimation 


It is illuminating to note that this last method can be motivated 
on another basis, even more fundamental than the principle of Gauss 
alluded to at the beginning of this section. Let it be required to esti- 
mate (3.6). Then if the hypothesized form of the model is valid, it 
must be true that 


Ti3 — = ¥13(=0, hyp.). (4.1) 
Sig T4812 = ex hyp.). (4.2) 


Now let px , kh = 1, 2; k = 1, --- , 4 denote unrestricted least-squares 
estimates of 7, in (3.8) and (3.9). By analogy to (4.1) and (4.2), form 


ta = P23B12 = 75 (4.3) 
Pis — PosBir = - (4.4) 


Since not both 7,4 = 0 and 7 = 0, it is not possible to solve (4.3) 
and (4.4) directly for 6% , as the number of equations exceeds the 
number of unknowns. But this is the classical problem thatemoti- 
vated the development of the method of least squares by Legendre 
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[1806] and Gauss [1809]. Transparently there are several reasonable 
ways to obtain a value of 8,4 , one of which involves an application 
of the minimax principle, i.e. select that value of 6% such that max 
{| v3 |, | v4 |} is minimized, a generally applicable procedure suggested 
by Gauss [1809]. Another obvious procedure would be to minimize 
the sum of squares (y*)” + (y4)’ with respect to B% , but, because 
of the covariance between p,; and p,, and p,; and py» , it is plausibly 
better to minimize the quadratic form 


= Sss(vi8)° + + Salvit)” (4.5) 
with respect to 8,4 where the covariance matrix of p,; , Dix is 


-1 
Cov (pi; Dix) = ba (4.6) 
Sse Sse j,k = 3,4. 


The S;, are functions of the observed z, , --- , 4 only. 

Formally, this procedure is equivalent to the Gauss-Aitken (cf. 
Aitken, [1935]) method of handling interdependent disturbances in 
linear regression. 

The value of 8,8 obtained in this way is identical with that obtained 
by the method described as the third variant above. The same esti- 
mates 7;* and 7,3 as before are obtained from 


= Pu — , Vib = Diz — (4.7) 
5. Statistical Properties of Estimates 


If the hypothesis expressed by (2.7) and (2.8) jointly is valid, 
then all three methods of estimation discussed in Section 3 yield esti- 
mates that are consistent, asymptotically normally distributed, and 
efficient. The leasi-variance ratio method and least-variance difference 
methods yield consistent, asymptotically normally distributed and effi- 
cient estimates of those equations whose forms are validly specified, 
regardless of the validity of other specifications in the model as a whole, 
(ef. Anderson and Rubin [1949, 1950], Theil [1953], Basmann [1957, 
1958]), whereas the full-information method fails in general to yield 
such estimates if a false specification is set up. 

The three methods differ in respect of the quantity of information 
utilized in estimation of structural coefficients not hypothetically zero. 
In respect of those coefficients the full-information method plausible 
utiizles more information than the least-variance difference method. 
However, all information is utilized in different ways. The comparison 
of these alternative estimators in respect of their supposed relative 
finite sample variances is frustrated at the outset by the fact that in 
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the simplest case, e.g., in which just one of the x, , --- , x, is hypo- 
thetically excluded from a structural equation (say) equation (3.6), 
the frequency functions of estimates fail to possess finite moments. 
For £,. is but the ratio of two normal variables, namely, the unre- 
stricted maximum-likelihood estimates of reduced-form coefficients in 
(3.8) and (3.9). 

As Fieller [1932] has shown, the frequency function of this ratio 
has a component of the Cauchy type. Note, as Geary [1930] and 
Fieller [1932] have shown, if the sampling variance of the denominator 
of such a ratio is small relative to (the) its population mean, this diffi- 
culty with infinite moments becomes practically negligible. 

It is an essential property of the full-information and least variance 
ratio methods that the estimates #,, , --- , #25 of coefficients 7, , --- , 25 
in equations (3.8) and (3.9) are adjusted so that two exogenous variables 
can be eliminated between the estimates of (3.8) and (3.9) to obtain 
an estimate (say) of (3.6). Consequently, the estimators of B,. , Yun , 
Vis are 


Bis = #13/to3 = , (4.8) 
= — , (4.9) 
Ye = the — (4.10) 
and 
Tis = tis — (4.11) 
As the restricted estimates #,,; , --- , #25 are very complicated functions 


of the random variables ¢, , €, in (3.8) and (3.9), it is hardly wonderful 
that the joint frequency function of (say) #,; and #,; is not known. 
In general, however, the ranges of variation of the denominators #23 
and #,, are likely effectively’ to overlap zero, and the joint frequency 
functions (say) $,(#,3 , #2;) and $2(#,, , #2;) are not likely to vanish 
at the origin. In short, the frequency function of 8,, is not likely to 
possess finite moments, nor, in general, will the frequency functions 
of » Yi2 and 

Let S;,,j,k = 1, --- ,4 denote sums of cross products of deviations 
of the variables x, , --- , 2, from their sample means. For simplicity 
only, let S,, = 0,7 # k. The least-variance difference estimates are 


= (pis + + (4.12) 
491 = Dui (4.13) 


ji.e., within the limit of computational accuracy. Pr[ | #23 | < €] is not negligible, where € > 0 is 
the limit of computational accuracy. 
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412 = Pre — (4.14) 
and 

415 = Pis — Bispos (4.15) 
where p,, , ~~ , Pos are unrestricted maximum-likelihood estimates 
of ,°-* , %25- Their distributions are, of course, known. 


The essential comparison is to be made between (4.8) and (4.12). 
In contradistinction to #2; and #., which can effectively contain zero 
as an interior point of their domains of variation, the denominator 
of (4.12) can effectively vanish only at the boundary of its domain. 
Moreover, the distribution of the denominator in (4.12) divided by 
O22 is that of a non-central x’-variate with nor-centrality parameter 


A= (x23Sss + 22 (4.16) 


If the standard deviations of p,; and p,; are small relative to the popu- 
lation parameters z,; and 7,, , the discontinuity of 8,2 at the boundary 
will be negligible for practical purposes. [Consider the case in which 
the ratios of 2,3 , 7,4 to their respective sampling standard deviations 
are exactly two, so that A = 8]. 

The comparison between (4.8) and (4.19) remains valid even when 
systems of (say) G > 2 equations are involved. The estimators of 
endogenous coefficients 8,; are always ratios of determinants. In the 
least-variance difference estimators, the denominators are always posi- 
tive definite of the form 


D = |P:SP, | (4.17) 
where P, is the appropriate submatrix of reduced-form coefficients 
and S~‘'o,;,h,i = 1, --- , Gis the exact sampling covariance between 


the Ath and ith rows of P, . The discussion of the denominator in 
(4.12) can be appropriately generalized to (4.17). In the case of least- 
variance ratio and full-information estimators, the determinants forming 
the denominators can vanish at interior points of their domains of 
variation. 

The author has studied finite sample (V = 16) distributions of 
least variance ratio and least variance difference estimators by means 
of a sampling experiment in which estimation of a three equation model 
was simulated 200 times. (Basmann [1959, b, c, dJ]). The experi- 
mental population was designed to permit a high degree of precision 
in the unrestricted estimates of the reduced-form equations. Under 
the ad hoc assumption that this precision would permit the existence 
of singularities in least-variance ratio estimation to be ignored for 
practical purposes, the experimental means and variances were calcu- 
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lated. The results are to be reported in detail elsewhere (Basmann 
{1959d]). Briefly, however, they can be summarized as follows: In 
respect of experimental means, the lJeast-variance difference estimates 
did better than least-variance ratio estimates except in one case in 
which both were close to the actual value. Least-variance difference 
estimators all had means very close to the population value. In respect 
of mean squared deviations, least-variance ratio estimators did rela- 
tively poorly, the best being 2.7 times the magnitude of the corre- 
sponding least-variance difference statistic, the worst ratio being 7.5. 
In respect of medians, the least-variance ratio estimators did signi- 
ficantly better than least variance difference estimators in all but one 
case; medians of least-variance ratio estimators were in all cases very 
close to the actual population values. 

Experimental cumulative distribution functions were plotted and 
compared. In all cases of least-variance difference estimators the 
distributions can be approximated very closely by the normal form 
with mean at the population value and variances calculated from 
asymptotic formulas. The maximum discrepancies are near the centers. 
The modes have not yet been accurately determined, but appear to 
be very close to the population value in every case. The distribution 
functions of least-variance ratio estimates exhibit a marked Cauchy- 
like “thickening” of the tails except in one case in which this 
“thickening” is not significant. In view of this tendency, the medians 
are plausibly better estimates of the centers of distributions of least- 
variance ratio estimators. 

In respect of the degree of concentration of their distribution func- 
tions in the neighborhood of the population parameters, the least- 
variance difference estimators did uniformly a little better than the 
least-variance ratio estimators. (Both did considerably better than 
the so-called “straightforward least-squares’ estimates, although the 
mean-squared deviations of the latter were slightly smaller than the 
corresponding statistics for the least-variance difference estimators in 
all but one case.) 

The experimental results tend to discredit the author’s ad hoc 
assumption that the plausible non-existence of finite moments of least- 
variance ratio estimates could be ignored as a result of the high degree 
of sampling precision put into the experimental population by design. 

The foregoing brief discussion of finite sample problems and the 
experimental results obviously are not intended to be definitive on 
this matter. They do, however, lend emphasis to the fact that it is 
still problematic, even from a theoretical point of view as well as from 
practical considerations, whether there is any well-defined advantage 
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in imposing on estimates all the restrictions implied by a model. It 
will always he a practical economic question whether to utilize a greater 
proportion of computational resources to perform the iterative process 


_ of imposing all of the hypothetical restrictions as in full-information 


estimation, or even a part of them, as in least-variance ratio estimation; 
or to calculate least-variance difference estimators and utilize facili- 
ties to calculate other statistics of relevance to the same problem, e.g., 
principal components of the data, serial correlation matrices, corrections 
of variance estimates, etc. 

There is still room for considerable theoretical as well as experi- 
mental work on the finite sample properties of all three types of esti- 
mators. Some important aspects of the problem have been studied 
experimentally by Monte Carlo techniques (Wagner [1958], Yancey 
and Neiswanger [1958]). 

Tests of the rank hypotheses of Section 2 have been devised (cf. 
Anderson and Rubin (1949, 1950], Basmann [1959 c]). The test criterion 
calculated from least-variance difference estimation is, under the con- 
ditions of the previous paragraph, plausibly distributed as Snedecor’s F. 
This conjecture has been borne out by the sampling experiment already 
alluded to (Basmann [1959 c]). The same appears to be true of the test 
criterion based on least-variance ratio estimation (Basmann [1959 bj). 


6. Numerical Example 


In principle the computation of least-variance difference estimates 
is extremely simple. One merely estimates the reduced-form param- 
eters of the population defined by (3.8) and (3.9) by maximum-likelihood 
methods, in this case, least squares, then prepares sums of cross-product 
matrices appropriate to least-squares estimation of (say) (3.6) in which 
calculated values of 7», appear as regressors. Some aspects of this 
computation have been explained elsewhere (Basmann [1959 a]).?_ The 


~~ 


*Professors George Judge and Francis Walker have called my attention to the following error 
occurring in Sec. 4.2, p. 77. 


The variance formula actually used in computing the reaults on p. 79 is 


where My, is the matrix defined on p. 75 and Myy* is the matrix defined on p. 76. 
The variance formula on p. 77, which gives an alternative estimate of wii, is incorrectly stated 
in that the term 


is missing. 


The following misprint occurs in Section 3.1, page 75: The heading of the matrix My, should be 
changed from 27, tO Wi, ¥2, 
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computation of least-variance ratio and full information estimates is 
discussed fully in the paper by Chernoff and Divinsky [1953]. 

The following simpletwo-equation model will be utilized to illustrate 
computation of least-variance difference estimates. The data (arti- 
ficial) shown in Table 6.1 are hypothesized to be related by the two 
structural equations: 


TABLE 6.1 
ILLUSTRATIVE DaTA 
Ya 2s 
420.6 202.8 26 59 
307 .2 148.5 16 46 
323.4 158.8 10 59 
352.2 173.0 23 47 
301.2 145.6 15 47 
318.8 154.3 ll 57 
272.2 128.9 1l 45 
356.8 170.8 28 40 
324.4 157.8 18 47 
313.5 149.4 21 40 
Ys, = BisYe, + Vis + Ue, (6.1) 
Yes = + + + Ut, (6.2) 
where the (ui; , Ue2), / = 1, --- , N are independently and identically 


distributed according to the bivariate normal law (0, 2) where® 
= 
12 Wee 
The corresponding reduced-form equations are 
Yu = Fula + Mie + + , (6.3) 


Yro = + + + (6.4) 


where (6.4) is identical with (6.2). The (e,; , €2),4 = 1, --- , N, are 
independently and identically distributed according to the normal bi- 
variate law (0, 2) where 


3Note that the observed ys: rather than the “true” 2 appears in equation (6.1) and hence un * 
en, cf. equations (3.6) and (3.8). The reason for this change will be apparent in the calculation of 
estimated sampling variances of f1: and 413. Cf. (6.16)-(6.19) below. 
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TABLE 6.2 
Sums or Cross Propucts oF DrEvIATIONS, THE SYMMETRIC Matrix M 
Ya 
" 14572.7812 7179.4141 1721.1299 978.7891 
Y2 3565 . 2383 815.9905 531.6699 
1 372.9000 — 143.2999 
Z2 462.1001 


G22 

The first step in least-variance difference estimation of equation 
(6.1) is to estimate the parameters ,; ,h = 1, 2; 7 = 1, 2, 3, o,; , and 
the sampling covariance matrix of estimates of x, ,h = 1, 2;k = 1, 2. 
As the procedure for doing this is well-known, only the results are 
given here. 

The reduced-form estimates are 


Yu = 6.1641z,, + 4.02962,. + 22.4498 +%,, R= .9991 
a) (+.0919)  (+.0826) (+4.8713) (6.5) 
b) (67.0504) (48.7947) (4.6086) 


= 2.9862z,, + 2.07667,. + 4.4061+ 2. R= .9956 
a) (+.1031) (+.0926) (+.54614) (6.6) 
b) (28.973) (22.4286) (.8068) 


(6.7) 
1.0212 3.4892 
Pa Pee 
Cov (p11 , Dis Par » Das) = (6.8) 
00096 .00251 


The entries onitimes (6.5a) and (6.6a) are estimated standard deviations 
of the coefficient estimator appearing directly above. The entries on 
lines (6.5b), (6’6b) are ratios of the absolute magnitudes of the latter 
to the corresponding standard deviations. From (6.5a), (6:6b) and 
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(6.8) are derived the estimated sampling correlations between equa- 
tions: 


(Pir » Pas) = +.3270 (6.9) 
r(Pi2 Poo) = +.3281. (6.10) 


Since the model (6.1) and (6.2) specifies that the exogenous variables 
x, and x, enter only if y2, = m2, # 0 and y22 = m2. ¥ O, the entries 
on line (6.6b) are to be examined in this light. In view of the discus- 
sion in Section 5, the large values in (6.6b) tend to confirm the assump- 
tion that the possibility of infinite sampling variances of estimates 6,» 
and 4,; can be ignored. A stronger indication of this is afforded by 
the appropriate entries of (6.5b) and (6.6b) taken in conjunction with 
the correlations (6.9) and (6.10). (Fieller [1932], Geary [1930]). The 
reduced-form statistics calculated in this example also tend to confirm 
the assumption that least-variance difference estimators derived from 
the population sampled will have finite moments, and have marginal 
distributions that can be approximated closely by the normal form. 
(Basmann [1959 d]). Moreover, these statistics tend to confirm the 
assumption that the distribution of the identifiability test criterion 
calculated below can be approximated closely by Snedecor’s F with 
1 and 7 degrees of freedom under y,, = 0, ¥:2 = 0. (Basmann [1959c]). 

(In econometric applications little attention has been paid to the 
problem of checking internal consistency of the outcome of unrestricted 
estimation of reduced-form and the methods applied to structural 
equations.) 

To calculate least-variance difference estimates of 8. , 413 . it is 
first necessary to calculate,‘ from Table 6.1 and equation 6.2, 


m*2 = + = 3540.7765 (6.11) 
= + Pi2My222 = 7172.2641 (6.12) 


10 10 10 
= Pa 2. LisYr2 + Poe 7. 
t=1 t=1 t=1 (6.13) 


10 


+ pia D> Yo = 256,318.1826. 


t=1 


B,2 and 7,; can be calculated directly: 
Bis = = 2.0256, 


(6.14) 


‘The results shown were obtained on an electronic computer (IBM 709). Some of the intermediate 
results were not printed out, and therefore had to be back-calculated, in order to be presented here. 
Consequently, the reader should expect to obtain slightly different results if he carries out calculations 
beginning with Table 6.1. 
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4:3 = 9:1 — = 6.9799. (6.15) 


The estimate @,, of w,, in 2 is calculated from reduced-form parameters 
h, ‘= 2, and Bis 


On = én — + = 12.9555. (6.16) 
Estimated sampling variances and covariances of §,, and 7,3 are obtained 
from 


= = 003659; (6.17) 
= Our(yt)?/Nm*? = 9.3799; (6.18) 
Cov (613 , 413) = = —.05817. (6.19) 


It will be recalled that the identifiability hypothesis y,, = 0, 7,. = 0 


“an (6.1) necessarily implies, but is not necessarily implied by, the rank 


~condition (cf. Section 2) 
11%22 — = (6.20) 


not identically. The sample information, not already utilized in esti- 
mation of 8,2 , 9:3 ,@:: , can be employed to test (6.20). [It is emphasized 
that only if (6.20) is rejected at some level of significance is it possible 
to “test’’ the identifiability of equation (6.1), ie., the specification 
Yn = 0, ¥12 = 0.] The likelihood function, with least-variance dif- 
ference estimates f,. , 4:2 , etc. as arguments is 


L | wa, (6.21) 
where the elements of w are 
tas = (N — += 1,2 (6.22) 
and 
_ G 
Z= (6.23) 
with 


G,(A) my 2Biomyry2 + 

= Wi 2812W + 
@lasmann [1957], [1958]). Consequently, the appropriate likelihood 
‘ratio criterion is 


(6.24) 


and 
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2InvA = GB) (6.25) 
is asymptotically distributed like x° with K, — G + 1 degrees of freedom. 
(In this example K, = number exogenous variables excluded = 2, 
G = number endogenous variables present = 2). If, however, the 
population allows of a high degree of precision in the estimation of 
reduced-form equa*‘ons, the distribution of 


- G@) 
G + 1 12(8) 
is closely approximated by that of Snedecor’s F with K, — G + 1, 
N — K degrees of freedom (Basmann [1959 c, 1959 dj). 
In this example one obtains, by appropriate calculations, the fol- 
lowing values of alternative test criteria, 


(6.26) 


—2 Ind = 2.799; ns at a= .05; (6.27) 
and 


- F = 1.9593 (6.28) 
with (1, 7) degrees of freedom, non-significant at a = .05. 
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QUERIES AND NOTES 
D. J. Finney, Editor 
152 NOTE: The Question of Stability with Positive Feedback 


Everett R. DEMPSTER 
University of California, Berkeley, California, U.S. A. 


In the interesting and instructive article by Turner and Stevens in 
Biometrics 15, 236-258, they write on page 251 that “Positive feedback 
is obviously unstable.”’ This statement does not appear to be uni- 
versally valid. In general, equilibrium values may be considered stable 
if deviations lead to corrective forces that either restore the equilibrium 
values or, at least, lead to asymptotic approaches to such restoration. 
To determine whether this occurs even in simple linear feedback systems 
sometimes requires very careful specification of such properties of the 
systems as the rates at which deviations occur and the rapidity with 
which the resultant corrective forces come into action. However, it 
seems that only very extravagant assumptions in these respects could 
account for instability in a linear system of the type indicated by the 
diagram of page 252 (reproduced below) in cases where the absolute 
value of the feedback factor a2:0;2 (in the denominator of equations 18) 
is less than unity; in fact stability is achieved with moderate feedback, 
whether positive or negative in sign, under extreme conditions for which 
even strong negative feedback is unstable. 


2 


If, for simplicity, we assume the intercepts (see page 241) to be 
zero, the “structural equations” corresponding to the diagram above 
are as follows: 


Ne = - 


Combining these an equilibrium value is obtained: 
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If we now specify that ~, changes suddenly from zero to some steady 
value # and produces an immediate effect on , , but that a slight 
interval elapses before feedback through 7, modifies the value 7, , 
then the initial value of 7, is a,,£% , after one cycle of feedback it becomes 
+ after two cycles it is + + 
and so on, one term being added in the brackets after each cycle. 
If the feedback action is rapid, », quickly approaches its equilibrium 
value, given by the sum of the geometric series, which is of course 
equal to a,,¢//(1 — a2ia:2), the same value indicated in the expression 
of the previous paragraph. A convergent series is not obtained if 
| @1:0;2 | > 1. Where it is less than unity, however, and regardless 
of whatever equilibrium values of ¢, and 7, may be assumed as a starting 
condition, if the equilibrium is upset by a sudden change of £, to the 
value £ , then 7, will assume, successively, values obtained by adding 
terms to a convergent series whose sum is the same as that indicated 
by the equilibrium expression above; hence the point of equilibrium 
for any steady value of £, is, according to the definition stated, a stable 
one. 
Although mechanical and electrical instances of linear positive feed- 
back abound, biological examples, while they may be numerous, are 
at least not immediately apparent. The following situation, however, 
may serve for purposes of illustration. Suppose that fresh nutrient 
solution and bacteria are introduced into a chemostat of volume V at 
constant rate. Let v be the volume introduced and n the number 
of bacteria added per unit time, and assume thorough mixing so that 
the overflow of volume » per unit time is a representative sample of 
the entire contents. Let v/V = p. If the bacteria in the chemostat 
do not reproduce and are not destroyed, the equilibrium value of N 
is clearly the product of the number n introduced per unit time, and 
the average time of sojourn of a bacterium in the chemostat (or equally 
the length of time required to introduce volume V of solution) which 
is 1/p. However, if the bacteria do reproduce, at the rate of r divisions 
per bacterium per unit time, the effective total number of bacteria 
added to the chemostat per unit time becomes n + Nr, and the total 
number at equilibrium (if such is attained) becomes (n/p) + N(r/p). 
Referring to the previous diagram, if we let N be the value of 7 , 
then we can consider the value of £, to be n, that of a,, to be (1/p), and 
the feedback factor a,,a,. to have the value (r/p). This is then positive 
feedback directly proportional to 7, , and the system corresponds to 
that of the diagram except for the simplification introduced by the 
omission of the intermediate variable 7, . 

By analogy we would expect the equilibrium value of N to be 
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(n/p)/{1 — (r/p)]. However, the direct effect of cause ~, (with value 
n) on (with value NV) would not be instantaneous, and the feedback 
of », on itself would also be slightly delayed. Therefore, even though 
the equilibrium value suggested above turns out to be correct, it is 
desirable to start with a differential equation in order to discover the 
manner in which N varies with time. The rate of change of N with 
time is the sum of nm (number of bacteria introduced per unit time), 
Nr (the increase in number of bacteria due to division), and —Np 
(the number of bacteria lost due to overflow). We therefore write: 


dN = (n+ Nr — Np) dt 
and, where n = 0 when ¢ = 0, this leads to the following result: 


N = — (/p)). 

This expression indicates that N builds up smoothly to its equilibrium 
value (n/p)/{[1 — (r/p)] providing r < p. The equilibrium would 
be a stable one (under the assumed conditions) since, where r < p, 
a slight excess in N would increase the loss of bacteria by overflow 
more than it would increase the gain by division, and vice versa in 
the event of a deficiency in N. If r = p, the expression does not hold 
inasmuch as the derivation involves a division by 1 — (r/p). Ifr > p, 
the numerator and denominator both change sign and, as ¢ increases, 
the value of N builds up at an accelerating rate without limit. In 
practice of course, if r > p, the number of bacteria would build up 
to a point where the nutrient is no longer adequate to maintain the 
reproduction of bacteria at the original rate 7, and an equilibrium would 
be established at some new value of r less than p. Experimentally 
(at lower levels of bacterial density), if we now consider the number 
of bacteria per unit volume as the variable instead of N, the feedback 
factor could be varied alone by changing the volume of the chemostat. 
At low chemostat volumes (r/p) would be small, and equilibrial bacterial 
densities should be only slightly affected by changes of volume; at 
higher volumes where (r/p) approaches unity, the increase of density 
due to small reductions in volume should be very great. 


153 NOTE: Straight Line Regression through the Origin 


Matcoum E. Turner 


Department of Biophysics and Biometry, Medical College of Virginia, 
Richmgnd, Virginia, U. S. A. 


There are many practical problems in which it is reasonable to 
assume the relationship represented by a straight line passing through 
the origin, 
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Ely.) = 6a, (1) 


where E(y,) is the expectation of y for a given x. The slope @ is generally 
unknown and hence requires estimation from experimental data. It 
is well known that the weighted least squares estimate of 8 is given by 


B= wz’, (2) 


where the weight w is inversely proportional to the variance o? . If 
yz has a normal distribution about 8z with variance o2 , then (2) is 
the maximum likelihood estimate of 8 as well. 

Three special cases are commonly considered: 


(i) w = 1, B= 2’. (3) 
(ii)w = 1/2, =[D (/2)/N). (4) 
(iii)w=1/z, B= (5) 


Case (i) is often employed at some distance from the origin but is 
usually inappropriate in the vicinity of the origin. However, under 
rather simple assumptions about the variances of normal models, we 
obtain the convenient simplified estimators (4) and (5). 

It does not seem to be generally realized that other simple, and 
often more realistic, assumptions about the distribution of y, can lead 
to these same estimators. Thus it happens in many applications of 
straight lines passing through the origin that negative values of y, 
are impossible or implausible. This perhaps is generally the case. 
Acceptance of the normal model on the other hand implies the existence 
of negative values and, if one is working near the origin, this discrepancy 
can be quite serious. 

Let us consider an alternative specification of the distribution. 
Suppose that y, follows the gamma distribution 


fy.) = (6) 


Now, E(y.) = n,b, = Bx and of = n,b2 = Brb,. Thus b, = Bz/n, 
and n, = B’x"/o2 . Consider case (ii). If w = 1/z’, then o? is pro- 
portional to x” and n, = n, a constant. Substitution of n for n, and 
Bx/n for b, in (6) gives 


f(y.) = (7) 
and the log likelihood is 


L = mN logn — mN log B — n > log x — N log I(n) 
+ (n— 1) logy — @/) (y/2). 


(8) 
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QUERIES AND NOTES 


Then 


the same estimator found previously by assuming a normal distribution 
for y. . 

An interesting special case arises when n = 1. The gamma distri- 
bution (7) degenerates to the exponential distribution and observed ay 
data take the form of a triangular density of points with greatest 
density in the vicinity of the x-axis. I have seen data of this description 
discarded off-handedly as ‘worthless’ merely because the traditional 
bell-shaped distribution did not occur. 

The estimator (5) for case (iii) is alternatively generated as the 
maximum likelihood estimator for a Poisson specification. We have 


Ply.) = (ax), (9) 


and the log likelihood is 


L= log ty) + Dy loge+ logz—B diz, 


and thus we get, by equating 8L,/ag to zero, 8 = >> y/)>_ x, which is 
identical with estimator (5). 

Of course, still other alterative specifications (infinitely many in 
fact) giving rise to estimators (4) and (5) are possible, but these two 
examples should suffice to illustrate the fact that dependence upon 
the normal distribution is not necessary for the sake of providing 
simple estimators. Here models even more realistic in certain circum- 
stances have given rise to precisely the same estimators as have been 
commonly recommended on the basis of normal theory. 
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BOOK REVIEWS 


J. G. Editor 
Members are Invited to Suggest Books for Listing or Review to the Editor 


TORII, T. The Stochastic Approach in Field Population Ecology with special 
3 reference to Field Insect Populations, Uéno, Tokyo: Japan Society for the 
Promotion of Science, 1956. Pp. 277. 35s. 


R. E. Buacxrra, Imperial College of Science and Technology 
(University of London), London, England 

In insisting on the importance, and the practicability, of understanding the 
biological processes which give rise to the sampling distributions of ecological 
practice, Professor Torii’s book invites comparison with the long, elegant, paper 
of Arbous and Kerrich on the distnbution of accident statistics, (Biometrics 7, 
‘1951, 340). The book, whilst both prolix and inelegant, contains some unusual bio- 
“metrical techniques; in particular, it revives Student’s idea, of computing the correla- 
tion between the numbers of organisms in a quadrat and those in contiguous quadrats, 
to examine the small-scale heterogeneity of the population. The idea is then ex- 
panded so that this structural autocorrelation for the whole area reflects the type 
of distribution to which the population conforms. 

This account of the author’s field experiments gets away to an indifferent 
start, making heavy weather of the elementary statistical concepts employed. 
Prof. Torii describes himself as ‘non-mathematically trained’ and the result of 
his simultaneous struggle with the English language and mathematical ideas is 
sometimes bewildering, however much the reader may sympathise with the author’s 
problems. This part, the first of four, is notably the worst in style, and has more 
-than its fair share of the 460 misprints noted in the book. In part 2 there is a useful 
distinction between non-random distributions arising from direct interactions be- 
“tween insects and those imposed by the heterogeneity of their environment. 

Part 3 shows the author’s enthusiastic industry at its best. Fluctuations of 
insect activity in time are assessed by means of careful experiments, sophisticated 
multivariate analyses, and detailed interpretation. The fourth part is a lengthy 
suramary of the work, which closes with nearly 250 references, many to Japanese 
work little known outside that country. 

There are several statistical lapses in the text, fortunately without serious 
practical consequences. On p. 77, after rejecting a use of the t-test because of a 
moderate inequality of variances, the author, apparently unaware of the Fisher- 


Behrens test, uses.a large-sample approximation which is no more appropriate. 


On p. 92 and elsewhere, the switch from the transformation (z + 0.5)’ forz < 10 
ue > 10) is made within one and the same analysis, a practice likely to bias 
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estimated components of variance. The author’s policy of sifting his data through 
every available significance test, with scant regard for the accumulation of false 
positives, leads him to correlate the number of male dungbeetles in cowpats with 
the total number of males and females found in them; the explanation for the ‘sig- 
nificant’ correlation is unconvincing (p. 165). 

An ecologist, with the patience to winnow the grain from the chaff, may be 
stimulated by this book, which deals with problems that tend to be left to the 
biologist by the statistician and conversely. 
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ABSTRACTS 


The following are abstracts of papers presented al meetings of the 
British Region during the session 1959/60. 


P.S. HEWLETT (Pest Infestation Laboratory, Slough, England) and R. L. 
675 PLACKETT (Department of Applied Mathematics, University of Liverpool, 
England). Models for Quantal Responses to Mixtures of Drugs. 


In the past, three mainly independent. approaches have been made to the 
quantitative analysis of responses to mixtures of two drugs; by examining, especially 
graphically, the relations between the doses of the drugs together producing given 
levels of response, by constructing mathematical models relating response to doses 
on the basis of biochemical drug-action kinetics, and by constructing models of a 
similar type through the coupling of relatively simple, general biological ideas with 
statistical concepts. 

A general method proposed for developing the models can in effect combine 
all three approaches. Each model is developed in three stages. (1) Conditions of 
quantal response in the individual organism are postulated in terms of events at 
the site (s) of biological action, sites of loss, etc. If appropriate, biochemical drug- 
action kinetics can be introduced at this stage, and the quantal response can be 
regarded as occurring when a graded response reaches a certain level of intensity. 
(2) Assumptions about the relation between the dose and the amount of drug reach- 
ing its site of action enable the conditions of (1) to be expressed in terms of doses 
and individual tolerances. (3) Finally, appropriate statistical assumptions, e.g. 
of a bivariate distribution of tolerances, give a relation between the proportional 
quantal response in a group of organisms and the doses of the drugs. 


J. G. SKELLAM and M. D. MOUNTFORD (The Nature Conservancy, 
676 London). Simultaneous Growth Diffusion with special reference to Popula- 
tion Dynamics and Ecological Genetices. 


Distribution maps, showing the growth and spread of biological populations 
in space and time, often suggest that under reasonably uniform conditions the 
area of distribution might be expected to expand radially with approximately 
uniform velocity. This conjecture is analogous to that of Fisher [1937, Ann. Eng. 
Lond., 7: 355-69] who represented the spread of an advantageous mutant gene in 
a linear habitat by the partial differential equation 


of/at = kA*f + mf(1 — f)i (k, m positive constants) 
in the particular case, i = j = 1, A? = 0?/ax?. 
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This equation, in both one dimensional and two dimensions] form (A? = @°/dz? + 
0*/dy?), was studied numerically in three main cases: (1) the intermediate case 
(Fisher’s), (2) the dominant case (¢ = 1, 7 = 2), and (3) the recessive case (¢ = 2, 
j = 1), the initial condition always being a small compact mass in the neighbourhood 
of the origin, representing a few initial individuals or mutant genes, 

In one dimension the asymptotic solutions of all three cases are waves of constant 
form moving with uniform velocity. In two dimensions the results are similar, 
except in the recessive case, where apparently a wave does not build up. 

In the case of Fisher’s equation and its two dimensional analogue, the numerical 
evidence strongly supports Fisher's conjecture that the velocity of propagation 
which ultimately prevails is the minimum theoretically possible. 


J.O. IRWIN (Medical Researeh Council Statistical Researeh Unit, London 

677. School of Hygiene and Tropical Medicine, London). The Contributions of A. G. 
McKendrick to the Theory of Stochastic Processes and their Applications to 
Epidemiology. 


The main object of the lecture was to described a paper of great importance 
written more than 30 vears ago, which until fairly recently escaped notice 
altogether,—“‘Applications of Mathematics to Medical Problems’, Proc. Edinburgh 
Math. Soc., 44, 98-130, 1925-6. Much of the work on stochastic processes in this 
field, re-discovered in the last ten or fifteen years, Was anticipated by MeKendrick. 

Among the univariate distributions he discovered as solutions of stochastic 
differential equations are the Poisson and Negative Binomial distributions and 
the Bessel form for the distribution of the difference of two Poisson Variates. He 
gives an extremely useful method for fitting a negative binomial or Poisson distri- 
bution to data with the zero class missing (see Biometrics, 15, 324-6, 1959). He 
also derives the bivariate analogue of the Poisson for correlated variates. 

MeKendrick’s models for two-dimensional cases are always illustrated by 
diagrams in which a representative point may pass from one compartment to an 
adjacent compartment horizontally, vertically or diagonally, the transition prob- 
abilities being indicated by arrows. Restricted cases in which entry into certain 
compartments of the model are not possible, are of particular interest. Problems 
considered under this head include those of house to house infection, and infection 
and relapse in malaria. He discovered the usual equation for the continuous infec- 
tion model with limited and unlimited populations, and found from it the distri- 
bution of total number of cases up to n = 6; thus deriving some of the U-shaped 
distributions discussed in recent years by D. G. Kendall, M.S. Bartlett and N. T. J. 
Bailey. 

Finally he considered the limiting form of Lis stochastic equations, when vari- 
ables are not restricted to integral values but become continuous. This led him 
to the three dimensional form of the diffusion equation, and other equations similar 
to those of mathematical physics. Applications of these to demographic problems 
are given, 


R. kk. BLACKITH. (Imperial College Field Station, Sunninghill, Ascot, 
678 Berks. England). Some Taxonomic Applications of Discriminatory and 
Canonical Analyses. 


Regent trends in taxonomy include the appreciation of multiple characters 
instead of a simple dichotomy, an increased use of quantitative techniques, and a 
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greater reserve concerning the injection of phylogenetic speculation into classifi- 
cations. The progression from qualitative to scalar methods, and thence to vector 
and, no doubt ultimately, to tensor analysis, seems to reflect not so much a new 
trend but the realization of the aims of the Pythagorean school, for so long stulti- 
fied by the nature of Aristotelian classification which formed the basis of Linnaeus’ 
work. 

Discriminant functions specify not only the extent to which groups of organisms 
have diverged (in terms of the generalised distance) but also the direction of those 
changes of form which have been exploited during the evolution of the groups. 
In the Acrididae, specific differences of form seem to be exploitations of polymor- 
phism within the species, so that one may think of the discriminants which link 
the morphs as signposts to possible courses of evolution. The directed variation 
in putatively homogeneous material, as disclosed by factor or principal component 
analysis, may serve the same ends. 


J. H. EDWARDS. (Medical Research Council Population Genetics Research 
679 Unit, Headington, Oxford). The Association between Blood Groups and 


Diseases with special reference to maximizing likelihood on electronic com- 
puters. 


Recently data have been accumulating of the association of blood groups and 
diseases using the unaffected sibs of the propositi as controls for the propositus 
and any affected sibs. When more than one blood group system is affected, or 
when some other factor is important such as sex, maximum likelihood estimation 
of the parameters involved is very tedious by conventional methods due to the 
complexity of the functions to be differentiated. 

By explicitly determining the parameters of the hyper-parabola defining the 
log-likelihood hypersurface, by first ascending the peak of this hypersurface by 
some form of steepest ascent technique, and then estimating the log likelihood for 
various values of the parameters involved, maximum likelihood estimates, with 
their variances and covariances, may be computed without explicitly obtaining 
the derivatives or partial derivatives algebraically. This form of analysis is par- 
ticularly suited to electronic computation in circumstances in which programming 
the derivatives or partial derivatives is difficult or impossible. 

The assumptions of a normal sampling distribution are identical to those usual 


- in conventional estimation using Newton’s method. 


Analysis of the data of Dr. C. A. Clarke and his colleagues on the ABO and 
secretor status of patients with duodenal ulcer and their sibs showed clear evidence 
of an association within sibship which was considerably, but not significantly, less 
than the association derived from population studies. 


ROBERT R. SOKAL (University of Kansas, Lawrence, Kansas and Galton 
680 Laboratory, University College, London). Factor Analysis of Biological 
Correlations. 


The advent of high speed computers has greatly simplified the mechanics of 
multiple factor analysis. However, the method involves a number of contro- 
versial assumptions and procedures. The relevance of these controversies to bio- 
logical problems is discussed. In biology dynamic models are to be preferred over 
the static ones of psychological factor analysis. Reification and identification of 
factors is a legitimate goal of the analysis of some biological matrices. The talk 
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is illustrated with a number of examples from different fields of biology. (In col- 
laboration with Howell V. Daly and F. James Rohlf.) 


B. T. WARNER and H. O. J. COLLIER (Department of Pharmacological 
681 Research, Parke, Davis & Company, Hounslow, Middlesex). The Analysis 
of Correlated Quantal Variates arising from a Test for Analgesics. 


The assessment of analgesics in laboratory animals is based on the depression 
of the normal response to @ noxious stimulus. 

We have developed a test for analgesics using the suppression of the squeak 
response obtained when an artery clip of suitable tension is applied to the toes of a 
young guinea-pig. The novel statistical feature of this test is that a guinea-pig 
has 14 toes so that up to 14 quantal responses may be obtained from a single animal. 
Standard methods of quantal analysis are not applicable without modification 
because the responses of different toes of the same animal are not independent. 

For estimating the ED50 of an analgesic drug, its 95 per cent fiducial limits 
and for test of validity, etc., a method of analysis is described. An important 
by-product of the analysis is the increase of information provided by the use of 
several toes instead of only one. 

On the basis of a simple model for correlated quantal variates, a relationship 
between information obtained and number of toes used is suggested. The relation- 
ship is confirmed by analysis of experimental data. The wider implications of 
using different numbers of toes are considered. 
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Correction 


A.P. DEMPSTER. A Significance Test for the Separation of Two Highly Multi- 
variate Small Examples, Biometrics 16 (1), 41-50, 1960. 


The second sentence of the paper on page 41 should read: ‘In this example, 
an experiment is performed on 12 human subjects of whom 4 were compulsive 
drinkers and 8 were healthy males from 16 to 39 years old showing no alcoholic 
tendencies.” 
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THE BIOMETRIC SOCIETY 


LE PROFESSEUR G. DARMOIS 


La Société Frangaise de Biométrie a le profond regret de faire part du décés 
de son ancien Président, le Professeur Georges DARMOIS, survenu & Paris le 3 
Janvier 1960. 

Le professeur G. Darmois était né en 1888 4 Eply (Meurthe et Moselle) oa il 
fit ses premiéres études, qu’il poursuivit au Lycée de Nancy, puis 4 l’Ecole Normale 
Supérieure. Au cours d’une longue carriére universitaire, il fut successivement 
professeur 4 la Faculté des Sciences de Nancy, puis & la Faculté des Sciences de 
Paris, ov il occupa la Chaire de Calcul des Probabilités et Physique Mathématique. 

Les travaux du professeur Darmois se sont excercés dans de nombreux domaines: 
géométrie, théorie de la relativité, astronomie, avant qu’ils ne s’orientent plus 
particuligrement—mais non de facgon exclusive—vers le calcul des probabilités 
et la statistique mathématique. 

Son got du concret l’amena rapidement 4 s’intéresser aux applications de la 
statistique, et & cet égard il joua un réle déterminant dans la diffusion en France 
des méthodes statistiques et probabilistes. A l'Institut de Statistique de ]’Uni- 
versité de Paris, dont il était le Directeur des Etudes, il créa de nombreux cours 
d’application: agriculture, industrie, économie générale, économétrie, analyse 
factorielle, génétique de populations. En 1952, il établit la liaison entre 1’Un:- 
versité et les Entreprises par la création du Centre de Formation aux Applications 
Industrielles de la Statistique, qui assure une formation statistique accélérée des 
ingénieurs et des cadres de |’industrie et publie la Revue de Statistique Appliquée. 
Il créa ensuite le Bureau Universitaire de Recherche Opérationnelle, consacré 4 
l’enseignement et aux applications, et organisa 4 |’Institut Henri Poincaré un 
enseignement de calcul automatique. 

Le professeur G. Darmois a représenté la France 4 la Commission de Statistique 
du Conseil économique et social des Nations Unies; il a été membre de la sous- 
commission des Sondages Statistiques des Nations Unies, Directeur du Centre 
Européen d’application Statistique agricole et démographique. II était depuis 
1953 Président de l'Institut International de Statistique, et il avait été élu en 1957 
membre de |’Académie des Sciences. 

Tous ceux qui ont connu le professeur G. Darmois, tous ceux pour qui il fut 
un conseiller inlassable ou un ami respecté, garderont le souvenir de ce grand savant 
qui fut aussi un parfait humaniste. 


British Region 
Two papers given at a meeting on February 23, 1960 were: 


R. E. BLACKITH—Some taxonomic applications of discriminatory and 
canonical analyses. 
J. H. EDWARDS—The association between blood groups and diseases. 
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At a meeting on April 13, 1960 the following papers were read: 


R. R. SOKAL—F actor analysis of biological correlations. 
B. T. WARNER and H. O. J. COLLIER—The analysis of correlated quantal 
variates arising from a test of analgesics. 


ENAR 

The Spring meeting of the Eastern North American Region was held jointly 
with the Institute of Mathematical Statistics at Columbia University in New York 
City on April 21-23, 1960. Eighty-five members of the Biometric Society attended. 
ENAR sponsored or co-sponsored the following nine sessions: 
INVENTORY AND QUEUEING THEORY. 


Chairman: Howard Levene—Peter E. Ney: A multi-server queueing problem. 
Jerome Sacks: Queues in series.’ Lajos Takacs: On the transient behaviour of a 
process with batch service. 


CONTRIBUTED PAPERS. 


Chairman: Marvin A. Kastenbaum—Malcolm E. Turner: A biometric theory of 
middle and long distance track records. Robert E. Serfling: A simple mathematical 
model of bias arising from effectiveness rates calculated from heterogeneous popula- 
tions with some error in diagnosis. 


INVITED ADDRESS—I. 


Chairman: Boyd Harshbarger-James Durbin: Two methods of constructing 
exact texts. 


SELECTED TOPICS—I. 


Chairman: Clyde Y. Kramer-H. A. David and Carmen A. Perez: On comparing 
different tests of the same hypothesis. C. Derman: On the replacement of periodically 
inspected equipment. Ernest M. Scheuer: Lower bounds on the probability associ- 
ated with certain confidence regions for the multivariate median. 


METHODS IN MULTIVARIATE ANALYSIS. 


Chairman: J. Blumen-R. Bargmann: Multivariate analysis in psychology and 
education. H. Rosenblatt: Multivariate experimental designs. 


SOME PROBLEMS IN THE ANALYSIS OF VARIANCE. 

Chairman: Donald A. Gardiner-Howard Levene: Analysis of variance of vari- 
ances. Leon Herbach: Properties of model II and of mixed models. 
PROBLEMS IN MEDICAL STATISTICS. . 


Chairman: John W.. Feriig-C. C. Spicer: A new closed sequential scheme for 
clinical trials. Donald Mainland: Some more difficulties in clinical trials. Nathan 
Maniel: Effects of non-linearity on subjective and objective analyses of hormonal 
assay data. 


INVITED ADDRESS—II. 


Chairman: S. B. Litiauer-A. Hald: Sampling inspection, prior distributions 
and cost. 
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SPECIAL TOPICS—II. 


Chairman: J. Wolfowitz-J. Kampé de Keriet: Estimation of thecorrelation of 
non-stationary random functions. B. Mandelbrot: Statistical sufficiency and the 
foundations of thermodynamics. Ronald Pyke: Markov renewal processes of zero 


order. 
CHANGES IN MEMBERSHIP 
(April 15-July 1, 1960) 
Changes of Address 


Miss J. Anderson, c/o Mrs. F. C. Anderson, 46, Eversley Terrace, Yeronga, S. 10, 
Brisbane, Australia. 

Mr. Berkeley Basil, Ward Blenkinsop and Company, Ltd., Anglo Estat, @hepton 
Mallet, Somerset, England. 

Dr. Samuel H. Brooks, 510 16th Street, Santa Monica, California, U.S. A. 

Dr. Alfred F. Butler, Changuinolé Research Station, Chiriqui Land Company, 
Almirante, Panama. 

Dr. Patrick Colin Butler, P. O. Box 237, Norwood, Massachusetts, U. S. A. 

Dr. Hugh Bishop Cannon, Statistical Research Service, Canada Dept. of Agriculture, 
Ottawa, Ontario, Canada. 

Mr. Frederick L. Carter, Jr., 7980 New Riggs Road, Apt. 205, Adelphi, Maryland, 
U.8.A. 

Mr. Frank B. Cramer, 20508 Moberly Place, Canoga Park, California, U. S. A. 

Dr. Angelo Cresseri, Via Fogazzaro 27, Milano, Italy. 

Miss S. V. Cunliffe, Silver Birches, Harriotts Lane, Ashtead, Surrey, England. 

Dr. Gian Maria Curto, Via Giulio Cesare 11, Bergamo, Italy. 

Mr. Mare Dalebroux, B. P. 49, Yangambi I, Belgian Congo. 

Dr. John Stapley de Cani, 114 West Mount Airy Avenue, Philadelphia 19, Pennsy!- 
vania, U.S. A. 

Mr. William F. Dossett, 5596 Canalino Drive, Carpenteria, California, U. S. A. 

Miss Lillian Elveback, Div. of Epidemiology, P. H., Research Institute of City of 
New York, New York 9, N. Y. 

Mr. J. Francois, 110, ave de Namur, Louvain, Belgium. 

Dr. Edison Galvao, Faculdade de Filosofia, Ciencias e Letras, Araraquara-l‘. S. 
Paulo, Brazil. 

Mr. Ivan K. Hadlock, 888 Irvine Street, Bronx 59, New York, U.S. A. 

Dr. W. Hartmann, Badehausallee 29, Cuxhaven, Germany. 

Dr. Rex L. Hurst, Statistical Laboratory, Utah State University, Logan, Utah, 
A. 

Mr. Arthur G. Itkin, 7745-a Washington Lane, Elkins Pk., Pennsylvania, U. S. A. 

Dr. Fred M. Jacobsen, Research and Development Department, Standard Oil 
Company, Box 431, Whiting, Indiana, U.S. A. 

Mr. S. K. Katti, Statistical Laboratory, Florida State University, Tallahassee, 
Florida, U.S. A. 

Mr. James Kerr, Technical Computing, Minnesota Mining Mfg. Co., St. Paul 6, 
Minnesota, U.S. A. 

Mr. Marcus Kjelsberg, Biostatistics Division, University of Minnesota, Minne- 
apolis 14, Minnesota, U. S. A. 

Prof. Tsuneo Koda, Higashi-Oizuma-machi 829, Nerima-ku, Japan. 

M. Philippe Lazar, 24 quai de Passy, Paris 16e, France. 
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Dr. Kuo Hwa Lu, Child Study Clinic, University of Oregon Dental School, Port- 
land 1, Oregon, U.S. A. 

Dr. Samuel B. Lyerly, 1703 East West Hwy., Apt. 211, Silver Spring, Maryland, 
U.S. A. 

Mr. Aden C. Magee, School of Home Economics, Women’s College, UNC, Greens- 
boro, North Carolina, U.S. A. 

Mr. Leonard A. Marascuilo, Department of Biostatistics, University of California, 
Berkeley, California, U.S. A. 

Miss Ethelyne L. McBee, Box 4723, Tucon, Arizona, U. S. A. 

Dr. Mario Morea, Via B. Gonzati 15, Padova, Italy. 

M. Jean Mothes, 7 place Foch, Enghien-Les-Bains, (Seine-et-Oise) France. 

Dr. D. Neumann, Akazienweg 2, Rostock, Germany. 

M. Jean Pagot, 123, rue de |’ Universite, Paris 7e, France. 

Dr. William E. Reynolds, California State Department of Public Health, 2151 
Berkeley Way, Berkeley 4, California, U.S. A. 

Dr. Marcel P. Schutzenberger, Rue Velpeau, Batiment 1, La-Croix-de Berny 
(Seine) France. 

Dr. Hilary L. Seal, Osborn Zoological Laboratory, Yale University, New Haven, 
Connecticut, U. S. A. 

Mr. Maurice Simon, “Le claney”’ Bostenberg, Bost lez Tirlemont, Belgium. 

Lt. Donald B. Siniff, 5040th Radar Eval, Fit., APO 942, Seattle, Washington, 
U.S. A. 

Dr. Harry Smith, Jr., 2073 State Rt. 131, Box 239, R. R. 3, Batavia, Ohio, U.S. A. 

Mrs. Grace Scholz Spitz, 800 Fourth Street, S. W., Washington 24, D. C., U.S. A. 

Professor Antonio Tizzano, Istituto di Igiene, Universita degli Studi, Napoli, Italy. 

Dr. U. Vogliazzo, Ospedale Mauriziano, Aosta, Italy. 

Dr. B. T. Warner, Parke, Davis and Company, Staines Road, Hounslow, Middlesex, 
England. 

Dr. Frank A. Weymouth, 1559 Posen Avenue, Berkeley 7, California, U. S. A. 

Dr. Evan J. Williams, Box 71, G. P. O., Canberra, A. C. T., Australia. 


New Members 
At Large 


Mr. Sydney E. Cruise, Department of Mathematics, University of Natal, Durban, 
South Africa. 


Australasian Region 

Dr. G. Gregory, Department of Statistics, University of Melbourne, Melbourne, 
Victoria, Australia. 

Belgian Region 

Prof. Bastenier, 4 Rue du Bailli, Brussels 5, Belgium. 

Dr. Roger Beckerj, rue de la Grange 51, Bruger, Belgium. 

Mr. Paul M. J. Bonnivair, 34 Rue Al-Boom, Wemmel-lez-Bruxelles, Belgium. 

Mr. Roger P. J. Bontinck, Chaussee de Ninove 70, Dilbeek (Brabant) Belgium. 

Dr. Julien Braun, 38 rue General Eenens, Brussels 3, Belgium. 

Dr. Jean-Marie Cordier, 76 rue des Combattants, Forchies-La-Marche, Hainaut, 
Belgium. 

Miss Crombez, 201 Rue Bolliard, Brussels, Belgium. 

Dr. Fernand de Smet, 129 Avenue Charles Woeste, Brussels, Belgium. 
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Mr. C. Donis, 35 av. Jeanne, Brussels, Belgium. 

Mrs. Dony, 20 Avenue Eugene Godeaux, Brussels 15, Belgium. 

Prof. Fernand Hoff, 231 rue de l’Eglise a Moha, Belgium. 

Dr. Jeurissen, Franz Day straat, Buizingen, Belgium. 

Dr. Gerard L. P. Laets, 38 rue R. Chalon, Belgium. 

Mrs. Suzy LeClerq, 49 Avenue du Juillet, Brussels 15, Belgium. 

Miss Berthe L’Hoest, 450 chssie de Waterloo, Brussels 6, Belgium. 

Mr. Ferdinand Michel, 48 rue du Luxembourg, Brussels, Belgium. 

Miss Renoz, 154 au. des Combattants, Genval, Belgium. 

Mr. Andre O. F. Scaut, BP 135, I. N. E. A. C., Yangambi, Belgian Congo. 

Dr. Albert Soenen, Rue du Village, Gorsem-St. Trond, Belgium. 

Mr. Jean E. Stuyvaert, c/o Cotonco—Vuira, Sudkivu, Belgian Congo. 

Mrs. Marie Therese Van der Venne, 2 we General Molitor, Brussebs 4, Belgium. 

Mr. Fernand Vanhamme, 33 Rue Paul Lauters, Brussels, Belgium. 

Miss Cecile Van Kerchove, Rue Orchimede 15, Brussels, Belgium. 

Prof. Dr. M. Van Miegroet, Rijkalandbouwhogeschool, Coupure Links 233, Gent, 
Belgium. 

Mr. Thierry Waffelaert, c/o INEAC, Monthawa, Dp. Bunia Ituri, Belgian Congo. 

Mrs. Hedwige Wullen, 20 Bd. Louis Schmidt, Brussels 4, Belgium. 

British Region 

Miss Eleanor M. Cooper, Grassland Research Institute, Hurley Near Maidenhead, 
Berkshire, England. 

Mr. D. E. Davies, Anti-Locust Research Centre, 1 Princes Gate, London S. W. 7, 
England. 

Miss Shiela M. Green, Grassland Research Institute, Hurley, Near Maidenhead, 
Berkshire, England. 

Dr. Alan Huitson, Brunel College of Technology, Woodlands Avenue, Acton, 
London W. 3, England. 

Dr. N. L. Johnson, Department of Statistics, Univ. College, Gower Street, London 
W. C. 1, England. 

Mr. I. B. Jones, Anti-Locust Research Centre, 1 Princes Gate, London S. W. 7, 
England. 

Miss V. Lofthouse, 54 Lensfield Road, Cambridge, England. 

Dr. L. K. O’Connor, Milk Marketing Board, Thames Ditton, Surrey, England. 


ENAR 


Dr. John E. Alman, Boston University, 725 Commonwealth Avenue, Boston 15, 
Massachusetts, U.S. A. 

Mr. Gerald Darwin Berndt, 711 LeMay Drive, Bellevue, Nebraska, U.S. A. 

Dr. Philip J. Clark, Department of Zoology, Michigan State University, East 
Lansing, Michigan, U. S. A. 

Mr. E. P. Cunningham, 208 Williams Street, Ithaca, New York, U.S. A. 

Mr. Richard A. Damon, Jr., Biometrical Services, ARS, Agricultural Research 
Center, Beltsville, Maryland, U.S. A. 

Mr. L. Lee Eberhardt, Game Division, Michigan Regaetnent of Conservation, 
Lansing 26, Michigan, U. 8S. A. 

Dr. Dale O. Everson, Biometrical Services, ARS, Agricultural Research Center, 
Beltsville, Maryland, A. 

Dr. Alan M. Gittelsohn, Director of Health Statistics, N. Y. State Dept. of Health, 
Albany 8, New York, U.S. A. 
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Miss Susan J. Gordon, Lemuel Shattuck Hospital, 170 Morton Street, Jamaica 
Plain, Massachusetts, U. 8. A. 

Mr. John R. Hatfield, Eli Lilly and Company, Indianapolis 6, Indiana, U. 8. A. 

Miss Emmarie C. Hemphill, Div. of General Medical Sciences, National Institutes 
of Health, Bethesda 14, Maryland, U. S. A. 

Mr. Paul E. Leaverton, Jr., 1532 Hawthorn Apts., Ames, Iowa, U.S. A. 

Dr. Robert Perloff, Department of Psychology, Purdue University, Lafayette, 
Indiana, U.S. A. 

Mr. Wolf Prensky, 109 Davenport Hall, Department of Agronomy, Urbana, Illinois, 
U.S. A. 

Mr. Daniel Smith, Interchemical Corporation, 432 West 45 Street, New York 36, 
N. Y., U.S. A. 

Dr. M. E. Terry, Room 3D-202, Bell Telephone Laboratories, Murray Hill, New 
Jersey, U.S. A. 

Mrs. Marguerite T. Wood, 900 Quincy Street, N. W., Washington 11, D. C., U.S. A. 


French Region 

Ing. Agr. Charles Kiss, Centre de Recherches Agronomiques, de Massif Central 
Route de Lyon, (Puy-de-Dome), Clermont-Ferrand, France. 

Mr. N. Rouanet, Charge d’etudes au Centre d’etudes et Recherches Psychotech- 
niques 13, rue Paul-Chautard, Paris 15e, France. 


German Region 

Dr. O. Ewert, Psychol. Inst. d. Univ., Mainz, Germany. 

Dr. Lucie Osadnik, Ferdinand-Rhode Str. 10, Leipzig C1, Germany. 

Dipl. Math. W. Schmidt, Unt. Wartestrasse, Wiesbaden, Germany. 

Dr. Hubert Walter, Am Fort Elisabeth 17/1, Mainz, Germany. 

India 

Dr. A. R. Kamat, Gokhale Inst. of Politics and Economics, Poona-4, India. 

Mr. A. S. Rawat, Research Officer, Stat. Branch, Forest Research Institute, New 
Forest P. O., Dehra Dun, India. 

Mr. R. 8. Singh, Director of Agriculture, Statistical Section, Hari Bhawan 7, Laxmi 
Bai Marg, Lucknow, India. 

Japan 

Mrs. Shyoko Hashiguchi, National Inst. of Agri. Sci., Nishigahara-2, Kita-ku, 
Tokyo, Japan. 


Netherlands 


Ir. A. R. Bloemena, Mathematisch Centrum, Amsterdam, Netherlands. 

Mr. J. Borst, arts. Moreelsestraat 4, Amsterdam (z) Netherlands. 

Mr. H. de Jonge, Institut voor Preventieve geneeskunde, Leiden, Netherlands. 

Ir. L. de Rijke, Bibliotheek Ministerie van Landbouw, den Haag, Netherlands. 

Miss A. Verbeek, c/o Mathem. Centrum, 2c Boerhavestraat 49, Amsterdam, Nether- 
lands. 


Sweden 


Dr. Claus Rerup, Department of Pharmacology, The Royal University, Lund, 
Sweden. 


Switzerland 
Dr. Paul Schmid, Mohrilstr. 31, Zurich 6, Switzerland. 
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WNAR 


Dr. Hans Abplanalp, Dept. of Poultry Husbandry, University of California, Davis, 
California, U.S. A. 

Dr. David F. Bray, Dept. of Poultry Science, University of British Columbia, 
Vancouver 8, B. C., Canada. 

Miss Sadako Hayase, 11952 Goshen Avenue, Los Angeles 49, California, U. S. A. 

Professor Gwilym Jenkins, Department of Statistics, Stanford University, Stanford, 
California, U.S. A. 

Dr. Craig Magwire, 2954 Winchester Way, Rancho Cordova, California, \ J A. 

Dr. Richard A. Pimentel, Biological Sciences Department, California State Poly- 
technic College, San Luis Obispo, California, U.S. A. 

Dr. Fred T. Shultz, 1378 S. Livermere Avenue, Livermore, California, U. S. A. 

Mr. Donald V. Sisson, Department of Applied Statistics, Utah State University, 
Logan, Utah, U.S. A. 
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NEWS AND ANNOUNCEMENTS 


Members are invited to Transmit to Their National or Regional Secretary (if members 
at large, to The General Secretary) news of appointments, distinctions, or retirements, 
and announcements of professional interest. 


UNIVERSITY OF MINNESOTA DEPARTMENT OF STATISTICS 


During the academic year 1958-1959 the Department of Statistics was estab- 
lished at the University of Minnesota. The Department has supplemented and 
coordinated the statistical activities of the University—graduate curriculum, 
research, and consulting. The Department’s organization involves direct appoint- 
ments as well as joint appointments in mathematics and the sciences. Following 
is the current staff of the Faculties of Statistics: STATISTICS: L. Hurwicz, I. 
Olkin, D. Richter, I. R. Savage, M. Sobel; MATHEMATICS: G. Baxter, M. 
Donsker, B. Lindgren, 8. Orey, W. Pruitt, E. Reich, F. Spitzer; AGRICULTURE: 
R. Comstock, C. Gates; BIOSTATISTICS: J. Bearman, J. Berkson, B. Brown, 
E. Johnson, R. McHugh; BUSINESS ADMINISTRATION: D. Hastings, J. 
Neter; INDUSTRIAL ENGINEERING G. McElrath. 


NEWS ABOUT MEMBERS 


E. J. Gumbel participated in the meeting of the International Statistical Insti- 
tute in Tokyo, and presented on the theory of extreme values at the Universities 
of Kyoto, Osaka, Toyko, and Manila. During July he gave a course in mathematical 
statistics at the Chulalongkorn University, Bangkok, Thailand. 

Fred M. Jacobsen, technical computing supervisor in the research and develop- 
ment department of Standard Oil Company, was transferred from Whiting, Indiana 
to Texas City, Texas. 

Eileen B. Karsh has been awarded a USPHS postdoctoral research fellowship 
in the Department of Psychology at the University of Pennsylvania. 

Allyn W. Kimball, formerly Chief of the Statistics Section of the Mathematics 
Panel at the Oak Ridge National Laboratory, has been appointed Professor and 
Head of the Department of Biostatistics in the School of Hygiene and Public Health 
of Johns Hopkins University in Baltimore. He will be filling the position vacated 
by Jerome Cornfield who recently returned to the National Institutes of Health. 

Katherine B. Ladd is now employed as a statistician in the Division of Maternal 
and Child Health of the Province of Ontario, Canada. She was formerly Assistant 
Professor of Biostatistics at the University of Vermont College of Medicine. 


EDUCATIONAL TESTING SERVICE FELLOWSHIPS 


The Educational Testing Service is offering for 1961-62 its fourteenth series 
of research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
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University. Open to men who are acceptable to the Graduate School of the Univer- 
sity, the two fellowships each carry a stipend of $3,750 a year, plus an allowance 
for dependent children. These fellowships are normally renewable. Fellows will be 
engaged in part-time research in the general area of psychological measurement at 
the offices of the Educational Testing Service and will, in addition, carry a normal 
program of studies in the Graduate School. 

The closing date for completing applications is January 6, 1961. Information 
and application blanks will be available about September 15 and may be obtained 
from: Director of Psychometric Fellowship Program, Educational Testing Service, 
20 Nassau Street, Princeton, New Jersey. 


Tables of the 
Hypergeometric Distribution 


GERALD J. LIEBERMAN AND DONALD B. OWEN. In this book the hyper- 
geometric distribution is fully tabulated. The tables are of great use to 
workers in applied mathematics, engineering, and the social and phys- 
ical sciences. About $15.00 


Markov Learning Models 
For Multiperson Interactions 


PATRICK SUPPES AND RICHARD C. ATKINSON. A major attempt to bridge 
the gap between statistical learning theory and social psychology by 
applying stimulus-sampling techniques to social interaction situations. 
Stanford Mathematical Studies in the Social Sciences, V. About $8.00 


STANFORD UNIVERSITY PRESS 
Order from your bookstore, please 
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JOURNAL OF THE 
AMERICAN STATISTICAL 
ASSOCIATION 
Volume 55 September 1960 Number 291 


TABLE OF CONTENTS 


Competing Exponential Risks, with Particular Reference to the Study of 


Smoking and Lung Cancer....... Lita ELVEBACK AND JosEPH BERKSON 
Statistical Evaluation of Cloud Seeding Operations........ K. A. BROWNLEE 


Mathematical Models for Ranking from Paired Comparisons. ..H. D. Brunk 


The use of Statistics in the Formulation and Evaluation of Social Programmes 
Octavio CaBELLO 


A Method of Analyzing Log-Normally Distributed Survival Data with 


Consumers’ Propensities to Hold Liquid Assets........ Haroip W. GuTHRm 
Bibliography on Sequential Analysis............ oem J. Epwarp Jackson 
On the Choice of Plotting Positions on Pechebility Paper 

Braprorp F. 
Early Failures in Life Testing...  .............. Rorergt G. Jr. 


Tables of Confidence Limits for the Binomial Distributions. . James Pacnares 


A Nonparametric Sum of Ranks Procedure for Relative Spread in Unpaired 


For further Information, please contact: 
AMERICAN STATISTICAL ASSOCIATION 
1757 K Street, N. W., Washington 6, D. C. 
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BIOMETRIE—PRAXIMETRIE 


Tome I, N° 1 January, 1960 


TABLE OF CONTENTS 


La probabilité mathématique dans les sciences naturelles. .Sir R. A. FisHEer 
Traduction: L. Martin 


Relation entre i’incidence des Antestiopsis et les dégits sur Caféier 


Ajustement d’un faisceau de systémes de polynémes orthogonaux. Applica- 
tion au développement de l’index histaminolytique au cours de la 


La vie de la Société 
Personalia 


Troisitme Journée BiométriquegGembloux, 26 juillet 1960) 


Tome I, N° 2 April, 1960 


TABLE OF CONTENTS 


Les levures sélectionées en cidrerie—Variations de certains de leurs carac- 
téres selon les espéces de pommes utilisées 


Analyse de données non orthogonales dans le cas d’un expérience 4 deux 


Etude biométrique de la production en cénes et en graines du pin de 
Koekelare—Critéres de sélection ........... ANNE Lencer Et P. GatHy 


Blocs randomisés et interactions M. Daesroux 


Publishers for the 


SOCIETE ADOLPHE QUETELET 
7 rue Héger-Bordet, Brussels 1, Belgium 


Presses Universitaires de Bruxelles 
50, av. Franklin Roosevelt 
Brussels 
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TECHNOMETRICS 


A Journal of Statistics for the 
Physical, Chemical, and Engineering Sciences 


Vol. 2, No. 1 February 1960 


CONTENTS 


Some Remarks on Wild Observations 
Statistical Estimation of the Gasoline Octane Number Requirement of New 
Model Automobiles.................C. Brrnecar R. R. MILLER 
The Effect of Sequential Batching for Acceptance—Rejection Sampling 
Upon Sample Assurance of Total Product Quality 
M. anv G. L. Burrows 
Elements of the Theory of Extreme Values 
System Efficiency and Reliability R. E. Bartow anv L. C. HunTER 
Aids for Fitting the Gamma Distribution by Maximum Likelihood 
J. A. GREENWOOD AND D. Duranp 
Experimental Designs to Adjust for Time Trends Hupsert M. Hii 
Tests for the Validity of the Assumption that the Underlying Distribution of 
Life is Exponential, Part I 
Programming Fisher’s Exact Method of Comparing Two Percentages 
W. H. Ropertson 
Misclassified Data from a Population. ...A. CoHEN, JR. 


Vol. 2, No. 2 May 1960 


CONTENTS 


‘Locating Outliers in Factorial Experiments 
Discussion of the Papers of Messrs. Anscombe and Daniel 
W. H. Krusxat, T. 8. Fercuson, J. W. Tukey anp E. J. Gumpex 
Test for the Validity of the Assumptions that the Underlying Distribution of 
Life is Exponential, Part IT 
The Partial Duplication of Response Surface Designs ; 
A Rank Sum Test for Comparing all Pairs of Treatments...R. G. D. Sree. 
The Percentile Points of Distribution Having Known Cumulants 
R. A. Fisoer anv E. A. Cornisa 
An Approximation to the Negative Moments of the Positive Binomial Useful 
in Life Testing W. MENDENHALL AND L. H. LEHMANN 
Order Statistics from the Gamma Distribution................ 8. S. Gupra 
Parallel Fractional Replicates C. DANIEL 


Technometrics is published quarterly in February, May, August, and 
November. The annual non-member subscription rate is $8.00. To members 
of the American Statistical Association and the American Society for Quality 
Control the rate is $6.00. Checks should be made payable to Technometrics 
and addressed to Technometrics, Post Office Box 587, Benjamin Franklin 
Station, Washington 6, D. C. 


iq® 
{ 


‘Sete 
iy 


INFORMATION FOR CONTRIBUTORS 


MANUSCRIPTS 


Contributions for Biometrics may be addressed to Dr. Ralph A. Bradley, Depart- 
ment of Statistics, The Florida State University, Tallahassee, Florida, U.S.A.; 
authors residing in the following Society Regions can expedite consideration of papers 
by submitting them to the appropriate Associate Editor, namely; BRITISH RE- 
GION: Dr. S. C. Pearce, East Malling Research Station, East Malling, Maidstone, 
Kent, England; AUSTRALASIAN REGION: Dr. E. A. Cornish, University of 
Adelaide, Adelaide, Australia; FRENCH REGION: Dr. Georges Teissier, Faculté 
des Sciences de Paris, 1 rue V. Cousin, Paris, France. QUERIES, NOTES, and 
related correspondence should be directed to Dr. D. J. Finney, Department of 
Statistics, University of Aberdeen, Meston Walk, Old Aberdeen, Scotland. Books 
and material for Book Reviews should be sent to Mr. J. G. Skellam, The Nature 
Conservancy, 19 Belgrave Square, London, S.W. 1, England. 

MANUSCRIPTS must be submitted in triplicate, with typescript doublespaced 
throughout. Marginal notes may obviate typographical difficulties presented by 
complicated formulae or tables—authors should not attempt editorial instructions 
or markings for the printer. TABLES should be identified by arabic number and 
by a short descriptive title. ILLUSTRATIONS should also be identified by arabic 
number and by a brief caption. (Captions should not be included in illustrations, 
but should be typewritten collectively on an accompanying sheet.) Originals 
should be approximately 8.5 x 11 in. (21.5 x 28 cm.). The original of each chart, 
diagram, or graph should be executed in black on white drawing paper or board, on 
blue tracing linen, or on coordinate paper ruled in blue only; coordinate lines to be 
reproduced should be ruled in black. For printing, illustrations may be reduced to 
¥% or \ original dimensions. Lines should therefore be of sufficient thickness, and 
decimal points, periods, and stippled dots should be solid black circles large enough 
to reproduce well. Lettering and numerals should be at least 1 mm. high when 
reproduced in a cut 3 in. (7.5 cm.) wide. Photographs should be prints on glossy 
paper with strong contrasts, and if grouped in a plate should be mounted contig- 
uously. All tables and illustrations should be mentioned explicitly in the text. 
REFERENCES (BIBLIOGRAPHIC) should be collectively listed alphabetically 
by author; textual citation by author and year is preferred. 


ABSTRACTS 


Abstracts of papers presented at meetings of the Biometric Society or of its 
regions are printed in Biometrics following such meetings. They should be submitted 
to the person designated to receive them for a particular meeting in exactly the form 
published in Biometrics (except for an Abstract Number), doublespaced on bond 
paper, and in duplicate. Use of formulae requiring display printing is to be avoided. 


Notices, ANNOUNCEMENTS, AND BrioMEtRIc Society REPORTS 


International and regional reports and notices should be submitted by the 
appropriate officers of the Society and its Regions in duplicate doublespaced on 
separate sheets exactly as they are to be printed in Biometrics. Other material to 
be printed in News jand Announcements should also be submitted doublespaced 
and in duplicate. + 


SusTaInInc MEMBERS OF THE BIOMETRIC SocIETy 
Abbott Laboratories 

American Cancer Society, Inc. 

General Foods Corporation, Research Center 

Heisdorf and Nelson Farms, Inc. 

Merck, Sharp and Dohme Research Laboratories 

Schering Corporation 

Smith, Kline and French Laboratories 

E. R. Squibb and Sons 

The Upjohn Company 

Wallace Laboratories, Division of Carter Products 

Wyeth Institute of Applied Biochemistry 
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BACK ISSUES 


Back issues of Biometrics are available at the following postage-paid 
prices in U.S.A. currency: 


Price per Price per 
Year Volume Number Single Number Volume (unbound) 


1to6 
1 to 6 
lto4 
lto4 
lto4 
lto4 
lto4 
1to4 
1to4 
1lto4 
1to4 
1to4 
1to4 
1to4 
1to4 
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Reprints of individual articles are not available except to authors at the 
time of printing. Three special issues are among the numbers listed 
above. They are: 


1947 Volume 3 Number 1 The Analysis of Variance 
1951 Volume 7 Number 1 Components of Variance 
1957 Volume 13 Number 3 The Analysis of Covariance 


Also available are: 
Fishery Reprint Series (Selected reprints from Vol. 5) $1.00 
Subject Index (Volumes 1-10) 1.00 
Proceedings, International Biometric Symposium, 
Campinas, Brazil, 1955. 1.00 


Inquiries, non-member subscriptions, and orders for back issues and 
other material listed above should be addressed to: Biomerrics, DEPART- 
MENT OF Statistics, THE Forma State UNIversity, TALLAHASSEE, 
Fiori, U.S.A. 
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